solve grid world problem

Reinforcement Learning in The Grid World problem

AuthorAlireza Andalib

Learning Machine

ارایه عنوان

تقویتی یادگیری

تقویتی مقایسه با یادگیریناظر با یادگیری

Supervised Learning:

Example Class

Reinforcement Learning:

Situation Reward Situation Reward…

ناظر با یادگیری با RL مقایسه

ناظر با یادگیری

Supervised Learning SystemInputs Outputs

Training Info = desired (target) outputs

Error = (target output – actual output)

تقویتی یادگیری

RLSystemInputs Outputs (“actions”)

Training Info = evaluations (“rewards” / “penalties”)

یادگیری اصلی های مشخصهتقویتی

یادگیری مسئله کلی ساختارتقویتی

سیاست10 }|Pr{),( ssaaas ttt

سیاست مشی خط

سیاست یادگیری یا مشی خط

بهینه سیاست آوردن بدست

محیط

مارکوف خاصیت

Markov Decision Processes

مشبک جهان مسئله تعریفGrid World

مشبک جهان مسئله تعریف

Bellmanالگوریتم

بلمن الگوریتم نهایی جوابحل :25معادله 25با میرسیم زیر مقادیر به مجهول

1.7120 9.7461 3.1311 5.4209 1.0036

0.7994 2.9233 2.3299 1.9586 0.4665

0.0023 0.7899 07355 0.4364 0.2287-

0.7664- 0.8488- 0.0076 0.1855- 0.9621-

0.9949- 1.3554- 1.0946- 1.4766- 2.0021-

IPEالگوریتم

الگوریتم نهایی IPEجواب) 100مثال( Kبا تکرار بار تا i,jبار میشود روز به خانه هر صفر مقادیر

: میرسیم زیر مقادیر به که جایی

1.4008 9.5698 3.1841 5.4309 0.8827

0.6503 2.9231 1.9576 1.8581 0.3910

0.0303- 0.8137 0.7354 0.4787 0.2830-

0.4062- 0.0118- 0.0183 0.1828- 0.7333-

0.6535- 0.4780- 0.4594- 0.5763- 0.9488-

PIالگوریتم

الگوریتم نهایی PIجواببا را عامل که هست قطع<یی سیاستی آمده دست به انتها در که نتایجی

.شروع Stateهر میدهد سوق< ها امتیاز بیشترین آوری جمع سمت به

Go Right Jump Go Left Jump Go Left

Go Up Go Up Go Left Go Up Go Left

Go Up Go Up Go Up Go Up Go Left

گیری نتیجه

منابع Horstmann, Cay. "GridWorld". horstmann.com.

Accessed September 15, 2008 www.inf.ed.ac.uk/teaching/courses/rl www.math-info.univ-paris5.fr/~bouzy/Doc/AA2/Rein

forcementLearning2 www.cs.berkeley.edu/~pabbeel/cs287-fa12 courses.cs.washington.edu/courses/cse473/12sp/

slides/16-mdp.pdf

THANKS FOR YOUR ATTENTION

solve grid world problem

Engineering

fall 2006costas busch - rpi1 reductions. fall 2006costas...

semantic grid resource discovery in atlas* · abstract we...

1) take out an approved calculator and formula sheet. 2)...

what problem do i solve?

ap biology math review 2013 1)take out an approved...

solve problem subtraction

speaking presentation how to solve problem

solve my problem when i want

7 problem solve - partshere.com _cp2020...

techniques that solve business problems grid analysis

our problem to solve

solve the mortgage processing "paper problem"

get ‘ em involved: problem solve

the problem we solve- leadership

clj cm6030 cm6040mfp solve problem

design: solve a problem

how to solve business problem successfully

prof. busch - lsu1 reductions. prof. busch - lsu2 problem is...

chep13 calculas problem solve

7 problem solve - partshere.com