introduction to reinforcement learning - sjtuzou-jn/reinforcement_learning_master/l1.pdf ·...

37
Introduction to Reinforcement Learning Prof. Junni Zou Institute of Media, Information and Network Dept. of Computer Science and Engineering Shanghai Jiao Tong University http://min.sjtu.edu.cn Fall, 2019 Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 1 / 37

Upload: others

Post on 21-May-2020

22 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Introduction to Reinforcement Learning

Prof. Junni Zou

Institute of Media, Information and NetworkDept. of Computer Science and Engineering

Shanghai Jiao Tong Universityhttp://min.sjtu.edu.cn

Fall, 2019

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 1 / 37

Page 2: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Outline

1 Course Information

2 About Reinforcement Learning

3 The Reinforcement Learning Problem

4 Elements of RL

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 2 / 37

Page 3: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Course Information

Table of Contents

1 Course Information

2 About Reinforcement Learning

3 The Reinforcement Learning Problem

4 Elements of RL

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 3 / 37

Page 4: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Course Information

About Me

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 4 / 37

Page 5: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Course Information

About Me

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 5 / 37

Page 6: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Course Information

About Class

Instructor: Junni Zou ([email protected]), SEIEE Building 3-437

Course website:http://www.cs.sjtu.edu.cn/~zou-jn/teaching.html

Reference book: Richard S. Sutton and Andrew G. Barto,Reinforcement Learning: An Introduction, Second Edition

TA: Yuankun Jiang ([email protected])

Grading Policy:project: 50%, technical report: 50%

We use the lecture slides of Prof. David Silver as a reference.http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 6 / 37

Page 7: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Course Information

Online Courses of Reinforcement Learning

Reinforcement Learning, David Silver, UCL

Reinforcement Learning, Emma Brunskill, Stanford

Deep Reinforcement Learning, Sergey Levine, UC Berkeley

Deep Reinforcement Learning and Control, Katerina Fragkiadaki,CMU

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 7 / 37

Page 8: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

About Reinforcement Learning

Table of Contents

1 Course Information

2 About Reinforcement Learning

3 The Reinforcement Learning Problem

4 Elements of RL

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 8 / 37

Page 9: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

About Reinforcement Learning

Development of Reinforcement Learning

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 9 / 37

Page 10: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

About Reinforcement Learning

Development of Reinforcement Learning

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 10 / 37

Page 11: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

About Reinforcement Learning

Branches of Machine Learning

Lecture 1: Introduction to Reinforcement Learning

About RL

Branches of Machine Learning

Reinforcement Learning

Supervised Learning

Unsupervised Learning

MachineLearning

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 11 / 37

Page 12: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

About Reinforcement Learning

Characteristics of Reinfrocement Learning

What makes reinforcement learning different from other machinelearning paradigms?

There is no supervisor, only a reward signal

Feedback is delayed, not instantaneous

Time really matters (sequential, non i.i.d data)

Agent’s actions affect the subsequent data it receives

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 12 / 37

Page 13: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

About Reinforcement Learning

Many Faces of Reinfrocement Learning

Lecture 1: Introduction to Reinforcement Learning

About RL

Many Faces of Reinforcement Learning

Computer Science

Economics

Mathematics

Engineering Neuroscience

Psychology

Machine Learning

Classical/OperantConditioning

Optimal Control

RewardSystem

Operations Research

BoundedRationality

Reinforcement Learning

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 13 / 37

Page 14: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

About Reinforcement Learning

Examples of RL

Defeat the world champion at Backgammon

Manage an investment portfolio

Control a power station

Make a humanoid robot walk

Play many different Atari games better than humans

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 14 / 37

Page 15: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

The Reinforcement Learning Problem

Table of Contents

1 Course Information

2 About Reinforcement Learning

3 The Reinforcement Learning Problem

4 Elements of RL

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 15 / 37

Page 16: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

The Reinforcement Learning Problem Reward

Rewards

A reward signal defines the goal in a reinforcement learning problem

A reward Rt is a scalar feedback signal

The agent’s sole objective is to maximize the total reward it receivesover the long run (cumulative reward)

Indicates how well the agent is doing at step t

Reinforcement learning is based on the reward hypothesis

Definition (Reward Hypothesis)

All goals can be described by the maximization of expected cumulativereward

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 16 / 37

Page 17: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

The Reinforcement Learning Problem Reward

Examples of Rewards

Fly stunt manoeuvres in a helicopter

+ value for following desired trajectory− value for crashing

Defeat the world champion at Backgammon

+/− value for winning/losing a game

Manage an investment portfolio

+ value for each $ in bank

Control a power station

+ value for producing power− value for exceeding safety thresholds

Make a humanoid robot walk

+ value for forward motion− value for falling over

Play many different Atari games better than humans

+/− value for increasing/decreasing score

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 17 / 37

Page 18: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

The Reinforcement Learning Problem Reward

Sequential Decision Making

Goal: select actions to maximize total future reward

Actions may have long term consequences

Reward may be delayed

It may be better to sacrifice immediate reward to gain morelong-term rewardExamples

A financial investment (may take months to mature)Refuelling a helicopter (might prevent a crash in several hours)Blocking opponent moves (might help winning chances many movesfrom now)

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 18 / 37

Page 19: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

The Reinforcement Learning Problem Environments

Agent and Environment

Lecture 1: Introduction to Reinforcement Learning

The RL Problem

Environments

Agent and Environment

observation

reward

action

At

Rt

Ot

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 19 / 37

Page 20: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

The Reinforcement Learning Problem Environments

Agent and Environment

Lecture 1: Introduction to Reinforcement Learning

The RL Problem

Environments

Agent and Environment

observation

reward

action

At

Rt

Ot At each step t the agent:

Executes action At

Receives observation Ot

Receives scalar reward Rt

The environment:

Receives action At

Emits observation Ot+1

Emits scalar reward Rt+1

t increments at env. step

At each step t the agent:Executes action At

Receives observation Ot

Receives scalar reward Rt

The environment:Receives action At

Emits observation Ot+1

Emits scalar reward Rt+1

t increments at every step

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 20 / 37

Page 21: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

The Reinforcement Learning Problem State

History and State

The history is the sequence of observations, actions, rewards

Ht = O1,R1,A1, · · · ,At−1,Ot ,Rt

i.e. all observable variables up to time t

i.e. the sensorimotor stream of a robot or embodied agentWhat happens next depends on the history:

The agent selects actionsThe environment selects observations/rewards

State is the information used to determine what happens next

Formally, state is a function of the history:

St = f (Ht)

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 21 / 37

Page 22: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

The Reinforcement Learning Problem State

Information State

An information state (a.k.a. Markov) contains all useful informationfrom the history.

Definition

A state St is Markov if and only if

P[St+1|St ] = P[St+1|S1, · · · , St ]

The future is independent of the past given the present

H1:t → St → Ht+1:∞

Once the state is known, the history may be thrown away

i.e. The state is a sufficient statistic of the future

The environment state Set is Markov

The history Ht is Markov

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 22 / 37

Page 23: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

The Reinforcement Learning Problem State

Rat ExampleLecture 1: Introduction to Reinforcement Learning

The RL Problem

State

Rat Example

What if agent state = last 3 items in sequence?

What if agent state = counts for lights, bells and levers?

What if agent state = complete sequence?

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 23 / 37

Page 24: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

The Reinforcement Learning Problem State

Rat ExampleLecture 1: Introduction to Reinforcement Learning

The RL Problem

State

Rat Example

What if agent state = last 3 items in sequence?

What if agent state = counts for lights, bells and levers?

What if agent state = complete sequence?What if agent state = last 3 items in sequence?

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 24 / 37

Page 25: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

The Reinforcement Learning Problem State

Rat Example

Lecture 1: Introduction to Reinforcement Learning

The RL Problem

State

Rat Example

What if agent state = last 3 items in sequence?

What if agent state = counts for lights, bells and levers?

What if agent state = complete sequence?What if agent state = last 3 items in sequence?

What if agent state = counts for lights, bell and levers?

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 25 / 37

Page 26: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

The Reinforcement Learning Problem State

Rat Example

Lecture 1: Introduction to Reinforcement Learning

The RL Problem

State

Rat Example

What if agent state = last 3 items in sequence?

What if agent state = counts for lights, bells and levers?

What if agent state = complete sequence?What if agent state = last 3 items in sequence?

What if agent state = counts for lights, bell and levers?

What if agent state = complete sequence?

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 26 / 37

Page 27: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Elements of RL

Table of Contents

1 Course Information

2 About Reinforcement Learning

3 The Reinforcement Learning Problem

4 Elements of RL

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 27 / 37

Page 28: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Elements of RL

Major Components of an RL Agent

An RL agent may include one or more of these components:

Policy: agent’s behaviour function

Value function: how good is each state and/or action

Model: agent’s representation of the environment

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 28 / 37

Page 29: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Elements of RL Policy

Policy

A policy is the agent’s behaviour

It is a map from state to action, e.g.

Deterministic policy: a = π(s)

Stochastic policy: π(a|s) = P[At = a|St = s]

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 29 / 37

Page 30: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Elements of RL Value Function

Value Function

Value function is a prediction of future reward

Used to evaluate the goodness/ badness of states

And therefore to select between actions, e.g.

vπ(s) = Eπ

[Rt+1 + γRt+2 + γ2Rt+3 + · · · |St = s

]

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 30 / 37

Page 31: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Elements of RL Model

Model

A model predicts what the environment will do next

Transitions: P predicts the next state

Rewards: R predicts the next (immediate) reward, e.g.

Pass′ = P[St+1 = s ′|St = s,At = a]

Ras = E[Rt+1|St = s,At = a]

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 31 / 37

Page 32: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Elements of RL Maze Example

Maze Example

Lecture 1: Introduction to Reinforcement Learning

Inside An RL Agent

Maze Example

Start

Goal

Rewards: -1 per time-step

Actions: N, E, S, W

States: Agent’s location

Rewards: −1 per time-step

Actions: N, E, S, W

States: Agent’s location

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 32 / 37

Page 33: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Elements of RL Maze Example

Maze Example: PolicyLecture 1: Introduction to Reinforcement Learning

Inside An RL Agent

Maze Example: Policy

Start

Goal

Arrows represent policy π(s) for each state sArrows represent policy π(s) for each state s

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 33 / 37

Page 34: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Elements of RL Maze Example

Maze Example: Value FunctionLecture 1: Introduction to Reinforcement Learning

Inside An RL Agent

Maze Example: Value Function

-14 -13 -12 -11 -10 -9

-16 -15 -12 -8

-16 -17 -6 -7

-18 -19 -5

-24 -20 -4 -3

-23 -22 -21 -22 -2 -1

Start

Goal

Numbers represent value vπ(s) of each state sNumbers represent value function vπ(s) for each state s

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 34 / 37

Page 35: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Elements of RL Maze Example

Maze Example: ModelLecture 1: Introduction to Reinforcement Learning

Inside An RL Agent

Maze Example: Model

-1 -1 -1 -1 -1 -1

-1 -1 -1 -1

-1 -1 -1

-1

-1 -1

-1 -1

Start

Goal

Agent may have an internalmodel of the environment

Dynamics: how actionschange the state

Rewards: how much rewardfrom each state

The model may be imperfect

Grid layout represents transition model Pass′

Numbers represent immediate reward Ras from each state s

(same for all a)

Agent may have an internalmodel of the environment

Dynamics: how actions changethe state

Rewards: how much rewardfrom each state

The model may be imperfect

Grid layout represents transitionmodel Pa

ss′

Numbers represent immediatereward Ra∫ from each state s(same for all a)

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 35 / 37

Page 36: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Elements of RL Categorization

Categorizing RL Approaches (1)

How to choose actions?Value Based

No Policy (Implicit)Value Function

Policy BasedPolicyNo Value Function

Actor-CriticPolicyValue Function

Whether can we learn amodel?

Model freePolicy and/or Value FunctionNo Model

Model BasedPolicy and/or Value FunctionModel

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 36 / 37

Page 37: Introduction to Reinforcement Learning - SJTUzou-jn/reinforcement_learning_Master/L1.pdf · Introduction to Reinforcement Learning $ Prof. Junni Zou Institute of Media, Information

���������� �����

��������������� ������ ���

���� ����� !" #�$ ��� %$&'(!�)(�*� %��+'� ,!)(-()(�� ,&�$)

,&�$)./01 2�� �33 % �45.6�$7�.+�+� .8��.�!!�+)�9.&:.;<<<=>?@[email protected]@.FGHI>@ICDBA5J�)��).1�8� #�*)�!)KLMNONPNQ.RS.TQUOVWKLSRXYVNORLW.VLU.ZQN[RX\ ]� ̂

_�'!���.)�.0*�)()$)�.�4./�9(�̀.0*4� ��)(�*̀.�*9.1�)8� abc(d*�'.% �!���(*d.e �$+1�)8� a.#���$*(!�)(�*.e �$+#��+$)� .f(�(�*.e �$+

Elements of RL Categorization

Categorizing RL Approaches (2)

Lecture 1: Introduction to Reinforcement Learning

Inside An RL Agent

RL Agent Taxonomy

Model

Value Function PolicyActorCritic

Value-Based Policy-Based

Model-Free

Model-Based

Prof. Zou (MIN, SJTU) Reinforcement Learning Spring, 2019 37 / 37