![Page 1: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/1.jpg)
REINFORCEMENT LEARNINGA Beginner’s Tutorial
By: Omar Enayet
(Presentation Version)
![Page 2: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/2.jpg)
The Problem
![Page 3: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/3.jpg)
Agent-Environment Interface
![Page 4: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/4.jpg)
Environment Model
![Page 5: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/5.jpg)
Goals & Rewards
![Page 6: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/6.jpg)
Returns
![Page 7: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/7.jpg)
Credit-Assignment Problem
![Page 8: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/8.jpg)
Markov Decision Process
t
An MDP is defined by < S, A, p, r, >S - set of states of the environmentA(s) – set of actions possible in state s - probability of transition from s
- expected reward when executing a in s - discount rate for expected reward
Assumption: discrete time t = 0, 1, 2, . . .
sr
s sr
s. . .t a
t +1t +1
t +1a
rt +2
t +2t +2
at +3
t +3. . .
t +3a
rsa
pss'a
![Page 9: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/9.jpg)
Value Functions
![Page 10: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/10.jpg)
Value Functions
![Page 11: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/11.jpg)
Value Functions
![Page 12: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/12.jpg)
Optimal Value Functions
![Page 13: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/13.jpg)
Exploration-Exploitation Problem
![Page 14: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/14.jpg)
Policies
![Page 15: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/15.jpg)
Elementary Solution Methods
![Page 16: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/16.jpg)
Dynamic Programming
![Page 17: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/17.jpg)
Perfect Model
![Page 18: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/18.jpg)
Bootstrapping
![Page 19: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/19.jpg)
Generalized Policy Iteration
![Page 20: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/20.jpg)
Efficiency of DP
![Page 21: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/21.jpg)
Monte-Carlo Methods
![Page 22: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/22.jpg)
Episodic Return
![Page 23: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/23.jpg)
Advantages over DP•No Model
•Simulation OR part of Model
•Focus on small subset of states
•Less Harmed by violations of Markov Property
![Page 24: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/24.jpg)
First Visit VS Every-Visit
![Page 25: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/25.jpg)
On-Policy VS Off-Policy
![Page 26: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/26.jpg)
Action-value instead of State-value
![Page 27: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/27.jpg)
Temporal-Difference Learning
![Page 28: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/28.jpg)
Advantages of TD Learning
![Page 29: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/29.jpg)
SARSA (On-Policy)
![Page 30: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/30.jpg)
Q-Learning (Off-Policy)
![Page 31: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/31.jpg)
![Page 32: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/32.jpg)
Actor-Critic Methods(On-Policy)
![Page 33: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/33.jpg)
R-Learning (Off-Policy)>>Average Expected reward per time-step
![Page 34: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/34.jpg)
Eligibility Traces
![Page 35: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/35.jpg)
![Page 36: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/36.jpg)
![Page 37: Reinforcement Learning : A Beginners Tutorial](https://reader034.vdocuments.net/reader034/viewer/2022051313/5483c1d1b4af9fff018b45de/html5/thumbnails/37.jpg)
REFERENCES
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning, Bradford Books, 1998.
Richard Crouch, Peter Bennett, Stephen Bridges, Nick Piper and Robert Oates - Monte Carlo - 2003
SLIDES FOR READING WITH : Omar Enayet – Reinforcement Learning : A
Beginner’s Tutorial - 2009