reinforcement learning michael roberts with material from: reinforcement learning: an introduction...

Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Upload: lester-kristopher-bryant

Post on 14-Jan-2016

220 views

Category:

Documents

1 download

Report

Download

Embed Size (px):

TRANSCRIPT

Reinforcement LearningMichael Roberts

With Material From: Reinforcement Learning: An Introduction

Sutton & Barto (1998)

What is RL?

• Trial & error learning– without model– with model

• Structure

s1 s2

Page 3: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

RL vs. Supervised Learning

• Evaluative vs. Instructional feedback

• Role of exploration

• On-line performance

Page 4: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

K-armed Bandit Problem

Agent

Actions

Average Rewards

-5

100

0, 0, 5, 10, 35

5, 10, -15, -15, -10

Page 5: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

K-armed Bandit Cont.

• Greedy exploration• ε-greedy • Softmax

Average Reward:

Incremental formula:

where: α = 1 / (k+1)

Probability of choosing action a:

Page 6: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

More General Problems

• More than one state• Delayed rewards

• Markov Decision Process (MDP)– Set of states – Set of actions– Reward function– State transition function

• Table or Function Approximation

Page 7: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Page 8: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Example: Recycling Robot

Page 9: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Recycling Robot: Transition Graph

Page 10: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Dynamic Programming

Page 11: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Backup Diagram

.25.25.25

.5.5.3.7.6.4

Rewards 10 5 200 200 -10 1000

Page 12: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Dynamic Programming:Optimal Policy

Page 13: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Backup for Optimal Policy

Page 14: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Performance Metrics

• Eventual convergence to optimality

• Speed of convergence to optimality

• Regret

(Kaelbling, L., Littman, M., & Moore, A. 1996)

Page 15: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Gridworld Example

Page 16: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Initialize V arbitrarily, e.g. , for all

Repeat

For each

until (a small positive number)

Output a deterministic policy, such that:

Page 17: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Temporal Difference Learning

• RL without a model• Issue of: temporal credit assignment• Bootstraps like DP

• TD(0):

Page 18: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

TD Learning

• Again, TD(0) =

TD(λ) =

where e is called an eligibility trace

Page 19: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Backup Diagram for TD(λ)

Page 20: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

TD-Gammon (Tesauro)

Page 21: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Additional Work

• POMDP’s

• Macros

• Multi-agent rl

• Multiple reward structures

Cooperative Inverse Reinforcement Learning...Cooperative Inverse Reinforcement Learning Dylan Hadfield-Menell CS237: Reinforcement Learning May 31, 2017

Reinforcement Learning with Action-Derived Rewards for ...pratiks/mlhc_2018/...P t final t0=t t0 tr t0(Sutton and Barto,1998). Q-learning is an o -policy method for calculating an

An Adaptive Dynamic Programming Algorithm for a Stochastic ... · in the context of neuro-dynamic programming and by Sutton & Barto (1998) in the context of reinforcement learning

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Reinforcement Learning Slides from R.S. Sutton and A.G. Barto Reinforcement Learning:

Reinforcement Learning and Deep Reinforcement Learningcse.ucdenver.edu/.../Class-22-Reinforcement-learning-DL.pdf · 2018. 11. 28. · Outlines 1 Principles of Reinforcement Learning

Reinforcement Learning1 Mainly based on “Reinforcement Learning – An Introduction” by Richard Sutton and Andrew Barto Slides are mainly based on the course

Autonomous Learning Laboratory – Department of Computer Science Perspectives on Computational Reinforcement Learning Andrew G. Barto Autonomous Learning

Inverse Reinforcement Learning CS885 Reinforcement

Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 3: The Reinforcement Learning Problem pdescribe the RL problem; ppresent

Reinforcement Learning: An Introduction · 2017-03-20 · i Reinforcement Learning: An Introduction Second edition, in progress Richard S. Sutton and Andrew G. Barto c 2012 A Bradford

M. Tech Programme in Robotics and Artificial Intelligence ... · 3) R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction, 2nd Edition, MIT Press, 2018. Robot Design

Deep Learning for Reinforcement Learning in · PDF fileDeep Learning for Reinforcement Learning in ... Deep Learning for Reinforcement Learning in Pacman Deep Learning für ... Während

Reinforcement Learning for RoboCup ... - Richard S. Suttonincompleteideas.net/papers/SSK-05.pdf · 1 Introduction Reinforcement learning (Sutton & Barto, 1998) is a the- oretically-grounded

Reinforcement Learning - 4. Model-free reinforcement Learning

Reinforcement Learning’s Computational Theory of Mind Rich Sutton Andy Barto Satinder SinghDoina Precup with thanks to:

Reinforcement Learning & Apprenticeship Learning

Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction From Sutton & Barto Reinforcement Learning An Introduction

From Reinforcement Learning to Deep Reinforcement …fagostin/assets/files/...Keywords: Machine learning · Reinforcement learning Deep learning · Deep reinforcement learning 1 Introduction

Reinforcement Learning an Introduction - Richard S. Sutton , Andrew G. Barto

Reinforcement Learning: An Introduction · i Reinforcement Learning: An Introduction Second edition, in progress Richard S. Sutton and Andrew G. Barto c 2014, 2015 A Bradford Book

Reinforcement Learning or Active Inference?karl/Reinforcement Learning or Active... · Reinforcement Learning or Active Inference? ... From the point of view of reinforcement learning

Reinforcement Learning: An Introduction · 2013-02-16 · i Reinforcement Learning: An Introduction Second edition, in progress Richard S. Sutton and Andrew G. Barto c 2012 A Bradford

New An Introduction tohgeffner/Andy2.pdf · 2019. 12. 4. · A. G. Barto, Barcelona Lectures, April 2006. Based on R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction,

Reinforcement Learning Das Reinforcement Learning-Problem Alexander Schmid

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 From Sutton & Barto Reinforcement Learning An Introduction

Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

Reinforcement Learning for Learning Rate Control · learning rate falls into the scope of reinforcement learning (RL) [Sutton and Barto, 1998]. Inspired by the recent suc-cess of

Reinforcement Learning: Learning to get what you want... Sutton & Barto, Reinforcement Learning: An Introduction, MIT Press 1998. sutton/book/the-book.html

Lecture 8: Policy Gradient I 1 - Stanford UniversityLecture 8: Policy Gradient I 1 Emma Brunskill CS234 Reinforcement Learning. Winter 2020 Additional reading: Sutton and Barto 2018

Lecture 9: Policy Gradient II 1 - Stanford University · Lecture 9: Policy Gradient II 1 Emma Brunskill CS234 Reinforcement Learning. Winter 2020 Additional reading: Sutton and Barto

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS · Reinforcement Learning: An Introduction Richard S. Sutton and Andrew G. Barto SINGLE AGENT LEARNING . 14 • Agents interact both

Reinforcement Learning - Multi-Agent Reinforcement

Generalization in Reinforcement Learning: Successful ...papers.nips.cc/paper/1109-generalization-in-reinforcement-learning... · Generalization in Reinforcement Learning: Successful

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)