probability cse 473 – autumn 2003 henry kautz. expectimax

Probability

CSE 473 – Autumn 2003

Henry Kautz

ExpectiMax

node chance a isn if )(ExpectiMax)(

nodemax isn if )}(children|)(ExpectiMaxmax{

node terminala isn if )(

)(ExpectiMax

nchildrens

Hungry Monkey: 2-Ply Game Tree

0 0 1 0 0 0 1 0 1 1 2 1 0 0 1 0

jump jumpjump

shake shake shakeshake

2/3 2/3 2/3 2/3 2/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

ExpectiMax 1 – Chance Nodes

0 0 1 0

1 1 2 1

0 0 1 0

jump jumpjump

2/3 2/32/3 2/3 2/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

ExpectiMax 2 – Max Nodes

0 0 1 0

1 1 2 1

0 0 1 0

jump jumpjump

2/3 2/32/3 2/3 2/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

ExpectiMax 3 – Chance Nodes

1/2 1/3

0 0 1 0

1 1 2 1

0 0 1 0

jump jumpjump

2/3 2/32/3 2/3 2/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

ExpectiMax 4 – Max Node

1/2 1/3

0 0 1 0

1 1 2 1

0 0 1 0

jump jumpjump

2/3 2/32/3 2/3 2/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

Policies

• The result of the ExpectiMax analysis is a conditional plan (also called a policy):– Optimal plan for 2 steps: jump; shake– Optimal plan for 3 steps:

jump; if (ontable) {shake; shake}

else {jump; shake}

• Probabilistic planning can be generalized in many ways, including:– Action costs– Hidden state

• The general problem is that of solving a Markov Decision Process (MDP)

2 Player Games of Chance

ExpectiMiniMax( )

( ) if n is a terminal node

max{ExpectiMiniMax( ) | children( )} if n is max node

min{ExpectiMiniMax( ) | children( )} if n is min node

( )ExpectiMiniMax( ) if n is a cs children n

hance node

Backgammon• Branching factor:

– Chance node: 21– Max node: about 20 on average– Size of tree: O(ckmk)– In practice: can search 3 plies

• Neurogammon & TD-Gammon (Tesauro 1995)– Learned weights on static evaluation function by playing

against itself

– Use results of games to optimize weights:• “Punish” features that were on in losing games

• “Reward” features that were on in winning games

– A kind of reinforcement learning

– Became world’s best backgammon player!

probability cse 473 – autumn 2003 henry kautz. expectimax

Documents

durchflusssensor br 473 flow sensor br 473 · english...

making decisions cse 592 winter 2003 henry kautz

monica davies avid curriculum developer sharon kautz,...

anboto 473

[mfjs2240] lilly kautz slideshow #1

ntn toolkit caring presentation kautz don

channeling (william kautz) (1 · web viewchanneling...

szÉchenyi istvÁn egyetem kautz gyula gazdaságtudományi...

quiz 5: expectimax/prob review expectimax assumes the worst...

assisted cognition henry kautz kautz@cs.washington.edu 590...

tjednik 473

quiz 5: expectimax/prob review

szÉchenyi istvÁn egyetem kautz gyula gazdaságtudományi...

apresentação 473

active vibration control using a kautz filter

channeling (william kautz)...

…ihr spezialist für elektrische energieverteilung kautz

introduction to artificial intelligence agents henry kautz

+ masca - teneriffa news · 2019-11-12 · 461 461 461 325...

hanno kautz projekte