from learning and control to

57
From Learning and Control to Deep Reinforcement Learning Benjamin Recht University of California, Berkeley

Upload: others

Post on 12-Jun-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From Learning and Control to

From Learning and Control to Deep Reinforcement Learning

Benjamin RechtUniversity of California, Berkeley

Page 2: From Learning and Control to
Page 3: From Learning and Control to
Page 4: From Learning and Control to

I took four courses in graduate school that completely revolutionized my thinking:

Page 5: From Learning and Control to

I took four courses in graduate school that completely revolutionized my thinking:

6.432: Detection and Estimation with Wornell9.520 Statistical Learning Theory with Poggio

6.24x: Complex systems with Megretski 6.253: Convex optimization with Berteskas

Page 6: From Learning and Control to

I took four courses in graduate school that completely revolutionized my thinking:

4 of these were LIDS courses!

6.432: Detection and Estimation with Wornell9.520 Statistical Learning Theory with Poggio

6.24x: Complex systems with Megretski 6.253: Convex optimization with Berteskas

Page 7: From Learning and Control to

I took four courses in graduate school that completely revolutionized my thinking:

4 of these were LIDS courses!

All are prerequisites for modern RL, but I never took an RL course…

6.432: Detection and Estimation with Wornell9.520 Statistical Learning Theory with Poggio

6.24x: Complex systems with Megretski 6.253: Convex optimization with Berteskas

Page 8: From Learning and Control to
Page 9: From Learning and Control to

trustable, scalable, predictable

Page 10: From Learning and Control to

Reinforcement Learning is the study of how to use past data to enhance the future manipulation of a dynamical system

Page 11: From Learning and Control to

Reinforcement Learning is the study of how to use past data to enhance the future manipulation of a dynamical system

Control Theory 🤔

Page 12: From Learning and Control to

What is ML?

Page 13: From Learning and Control to

What is ML?using past data to learn about and/or act upon

the world

Page 14: From Learning and Control to

What is ML?using past data to learn about and/or act upon

the world

Environments too complex

Page 15: From Learning and Control to

What is ML?using past data to learn about and/or act upon

the world

Environments too complex

Sensing too complex

Page 16: From Learning and Control to

What is ML?using past data to learn about and/or act upon

the world

Environments too complex

Sensing too complex

Models too complex

Page 17: From Learning and Control to

What is Control?

Page 18: From Learning and Control to

What is Control?using feedback to mitigate the effects of

dynamic uncertainty

Page 19: From Learning and Control to

What is Control?using feedback to mitigate the effects of

dynamic uncertainty

Environments are uncertain

Page 20: From Learning and Control to

What is Control?using feedback to mitigate the effects of

dynamic uncertainty

Environments are uncertain

Sensing/componentsare uncertain

Page 21: From Learning and Control to

What is Control?using feedback to mitigate the effects of

dynamic uncertainty

Environments are uncertain

Sensing/componentsare uncertain

Models are uncertain

Page 22: From Learning and Control to

How do we get the best of both?

Page 23: From Learning and Control to

How do we get the best of both?

Dynamics& ControlDetailed Models

Page 24: From Learning and Control to

How do we get the best of both?

Dynamics& ControlDetailed Models

Machine Learning

& Big Data

Page 25: From Learning and Control to
Page 26: From Learning and Control to
Page 27: From Learning and Control to
Page 28: From Learning and Control to
Page 29: From Learning and Control to
Page 30: From Learning and Control to

How do we get the best of both?

Dynamics& ControlDetailed Models

Machine Learning

& Big Data

Page 31: From Learning and Control to

Optimization

How do we get the best of both?

Dynamics& ControlDetailed Models

Machine Learning

& Big Data

Page 32: From Learning and Control to

RL Methods Gxt

ux

e

minimize Ee

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡t(⌧t)

<latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit>

How to solve optimal control when the model f is unknown?

Page 33: From Learning and Control to

RL Methods Gxt

ux

e

model-based

• Model-based: fit model from data (aka, standard engineering practice)

minimize Ee

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡t(⌧t)

<latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit>

How to solve optimal control when the model f is unknown?

Page 34: From Learning and Control to

RL Methods Gxt

ux

e

model-based

• Model-based: fit model from data (aka, standard engineering practice)• Model-free

minimize Ee

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡t(⌧t)

<latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit>

How to solve optimal control when the model f is unknown?

Page 35: From Learning and Control to

RL Methods Gxt

ux

e

approximate dynamic programming

model-based

• Model-based: fit model from data (aka, standard engineering practice)• Model-free

- Approximate dynamic programming: estimate cost from data

minimize Ee

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡t(⌧t)

<latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit>

How to solve optimal control when the model f is unknown?

Page 36: From Learning and Control to

RL Methods Gxt

ux

e

approximate dynamic programming

model-based

• Model-based: fit model from data (aka, standard engineering practice)• Model-free

- Approximate dynamic programming: estimate cost from data- Direct policy search: search for actions from data

direct policy search

minimize Ee

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡t(⌧t)

<latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit>

How to solve optimal control when the model f is unknown?

Page 37: From Learning and Control to

Deep Reinforcement Learning

Page 38: From Learning and Control to

Deep Reinforcement Learning

• Simply parameterize value function or policy as a deep net

Page 39: From Learning and Control to

Deep Reinforcement Learning

• Simply parameterize value function or policy as a deep net• All of the ideas have been here since NDP!

Page 40: From Learning and Control to

Deep Reinforcement Learning

• Simply parameterize value function or policy as a deep net• All of the ideas have been here since NDP!• Most of these algorithms don’t really “work.”

Page 41: From Learning and Control to

What is ML good for in control?

Page 42: From Learning and Control to

What is ML good for in control?• Fundamentally, almost all machine learning successes are

in nonparametric prediction (mostly classification).

Page 43: From Learning and Control to

What is ML good for in control?• Fundamentally, almost all machine learning successes are

in nonparametric prediction (mostly classification).

Page 44: From Learning and Control to

What is ML good for in control?• Fundamentally, almost all machine learning successes are

in nonparametric prediction (mostly classification).

Page 45: From Learning and Control to

What is ML good for in control?• Fundamentally, almost all machine learning successes are

in nonparametric prediction (mostly classification).

Page 46: From Learning and Control to

What is ML good for in control?• Fundamentally, almost all machine learning successes are

in nonparametric prediction (mostly classification).

Perceptual sensors in the loop

Page 47: From Learning and Control to

What is ML good for in control?• Fundamentally, almost all machine learning successes are

in nonparametric prediction (mostly classification).

Perceptual sensors in the loop Forecasting in MPC

Page 48: From Learning and Control to

What is ML good for in control?• Fundamentally, almost all machine learning successes are

in nonparametric prediction (mostly classification).

Perceptual sensors in the loop Forecasting in MPC

How to incorporate uncertain predictive perception in trustable, scalable, predictable autonomy?

Page 49: From Learning and Control to

managing uncertainty and learning in optimal control

Gxt

ux

e

changing costs

uncertain dynamics.

safety constraints

minimize Ee

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡t(⌧t)xt 2 X , ut 2 U

<latexit sha1_base64="QggUaKjlGoVnnNY5ZQ07WPvBgtU=">AAADUXicbVJNb9NAEF0nfJQUaApHLisiqlREUVyQ4FKpakFw6KGIpo2UNdZ6M05WXa/N7hglWP6VnJD4I9xYJ64gCWNZGr8383bGb6NMSYuDwU+v0bxz9979nQet3YePHu+1959c2TQ3AoYiVakZRdyCkhqGKFHBKDPAk0jBdXRzVvHX38BYmepLXGQQJHyqZSwFRweF7V8sgqnUBTeGL8pCqbLFkiidF4nUMpHfoaQHlCUcZ1FUvC9DYApiHDObJ2GBx3755ZKehdidh9jLQzxkRk5nGDBWy9g+9iuJuat+6Zf0mMZ/q3tQdbDWAXUfjmKZdCRDnt/irpAyqVcTCK6KUcl6PVY/y7Y1eujGBz2p1wnbnUF/sAy6nfh10iF1XIT7XsAmqcgT0CgUt3bsDzIMnBxKocCJ5xYyLm74FMYu1TwBGxRLG0r6wiETGqfGvRrpEv23o+CJtYskcpXVvHaTq8D/ceMc47dBIXWWI2ixOijOFcWUVp7SiTQgUC1cwoWRblYqZtxwgc75tVOW2hmItU2Kea6lSCewgSqco+EOtIAJl7raqvgglaKfubb0vHL6lnWyFd19J6cSbe/cXS99uFXsDPE3f/92cnXU91/1jz697pyc1tbskGfkOekSn7whJ+QjuSBDIrxTb+Z99UzjR+N3kzQbq9KGV/c8JWvR3P0DA+oP8Q==</latexit>

zt = g(xt)<latexit sha1_base64="madLnyVNkMOmsccuzY6D2fH57kg=">AAACf3icbVFdaxNBFJ2stdZoa6qPvgwGMYESdqvQ9kEIKtSHPKRo0kKyhLuTm82Q2dll5m5JXPI3fNW/5b/pbBLBJF64cDjnft8oU9KS7/+peI8OHh8+OXpaffb8+ORF7fRl36a5EdgTqUrNXQQWldTYI0kK7zKDkEQKb6PZ51K/vUdjZaq/0yLDMIFYy4kUQI4a/hgR/8jjxnxEzVGt7rf8lfF9EGxAnW2sOzqthMNxKvIENQkF1g4CP6OwAENSKFxWh7nFDMQMYhw4qCFBGxaroZf8rWPGfJIa55r4iv03o4DE2kUSucgEaGp3tZL8nzbIaXIZFlJnOaEW60aTXHFKeXkBPpYGBamFAyCMdLNyMQUDgtydtrqsamcotjYp5rmWIh3jDqtoTgYcaZESkLrcqriWSvFvoC3vyHhKf1VXtpQbX2QsyZ513DN0cy/YPSTYPf8+6J+3gvet85sP9fanzWuO2Gv2hjVYwC5Ym31lXdZjgmXsJ/vFfnsV753X8vx1qFfZ5LxiW+ZdPQC16MQ+</latexit>

perceptual sensing

How to incorporate uncertain predictive perception in trustable, scalable, predictable autonomy?

Page 50: From Learning and Control to

Actionable Intelligence is the study of how to use past data to enhance the future manipulation of a dynamical system

Page 51: From Learning and Control to

Actionable Intelligence is the study of how to use past data to enhance the future manipulation of a dynamical system

Page 52: From Learning and Control to

Actionable Intelligence is the study of how to use past data to enhance the future manipulation of a dynamical system

Page 53: From Learning and Control to

Actionable Intelligence is the study of how to use past data to enhance the future manipulation of a dynamical system

Page 54: From Learning and Control to

Actionable Intelligence is the study of how to use past data to enhance the future manipulation of a dynamical system

Page 55: From Learning and Control to

Actionable Intelligence is the study of how to use past data to enhance the future manipulation of a dynamical system

As soon as a machine learning system is unleashed in feedback with humans, that system is an actionable intelligence system, not a machine learning system.

Page 56: From Learning and Control to

Actionable Intelligence trustable, scalable, predictable

Page 57: From Learning and Control to

Mark your calendars!Deadline: November 15th, 2019

6-page papers

Formal call for papers will be out shortly!

Local organizers: Ben Recht, Claire TomlinL4DC.org

L4DC 2020 – Learning for Dynamics and ControlUC Berkeley, June 10-11, 2020