adversarial search - cs.utexas.edu

41
CS 188: Artificial Intelligence Adversarial Search Prof. Scott Niekum The University of Texas at Austin [These slides are based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]

Upload: others

Post on 21-Nov-2021

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Adversarial Search - cs.utexas.edu

CS188:ArtificialIntelligenceAdversarialSearch

Prof.ScottNiekumTheUniversityofTexasatAustin

[TheseslidesarebasedonthoseofDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.AllCS188materialsareavailableathttp://ai.berkeley.edu.]

Page 2: Adversarial Search - cs.utexas.edu

GamePlayingState-of-the-Art

▪ Checkers:1950:Firstcomputerplayer.1994:Firstcomputerchampion:Chinookended40-year-reignofhumanchampionMarionTinsleyusingcomplete8-pieceendgame.2007:Checkerssolved!

▪ Chess:1997:DeepBluedefeatshumanchampionGaryKasparovinasix-gamematch.DeepBlueexamined200Mpositionspersecond,usedverysophisticatedevaluationandundisclosedmethodsforextendingsomelinesofsearchupto40ply.Currentprogramsareevenbetter,iflesshistoric.

▪ Go:2016:AlphaGo,createdbyGoogleDeepMindbeat9-danprofessionalGoplayerLeeSedol4-1onafullsized19x19board.AlphaGocombinedMonteCarloTreeSearchwithdeepneuralnetworks,improvingviareinforcementlearningthroughself-play.

▪ OpenAIFive(DOTA):gettingclosetoworld-class

Page 3: Adversarial Search - cs.utexas.edu

Howtoconsiderbehaviorofghosts?

Page 4: Adversarial Search - cs.utexas.edu

AdversarialGames

Page 5: Adversarial Search - cs.utexas.edu

▪ Manydifferentkindsofgames!

▪ Axes:▪ Deterministicorstochastic?▪ One,two,ormoreplayers?▪ Zerosum?▪ Perfectinformation(canyouseethestate)?

▪ Wantalgorithmsforcalculatingastrategy(policy)whichrecommendsamovefromeachstate

TypesofGames

Page 6: Adversarial Search - cs.utexas.edu

DeterministicGames

▪ Manypossibleformalizations,oneis:▪ States:S(startats0)▪ Players:P={1...N}(usuallytaketurns)▪ Actions:A(maydependonplayer/state)▪ TransitionFunction:SxA→ S▪ TerminalTest:S→ {t,f}▪ TerminalUtilities:SxP → R

▪ Solutionforaplayerisapolicy:S→ A

Page 7: Adversarial Search - cs.utexas.edu

Zero-SumGames

▪ Zero-SumGames▪ Agentshaveoppositeutilities(valuesonoutcomes)▪ Letsusthinkofasinglevaluethatonemaximizesand

theotherminimizes▪ Adversarial,purecompetition

▪ GeneralGames▪ Agentshaveindependentutilities(valueson

outcomes)▪ Cooperation,indifference,competition,and

moreareallpossible▪ Morelateronnon-zero-sumgames

Page 8: Adversarial Search - cs.utexas.edu

AdversarialSearch

Page 9: Adversarial Search - cs.utexas.edu

Single-AgentTrees

8

2 0 2 6 4 6… …

Page 10: Adversarial Search - cs.utexas.edu

ValueofaState

Non-TerminalStates:

8

2 0 2 6 4 6… … TerminalStates:

Valueofastate:Thebestachievableoutcome(utility)fromthatstate

Page 11: Adversarial Search - cs.utexas.edu

AdversarialGameTrees

-20 -8 -18 -5 -10 +4… … -20 +8

Page 12: Adversarial Search - cs.utexas.edu

MinimaxValues

+8-10-5-8

StatesUnderAgent’sControl:

TerminalStates:

StatesUnderOpponent’sControl:

Page 13: Adversarial Search - cs.utexas.edu

Tic-Tac-ToeGameTree

Page 14: Adversarial Search - cs.utexas.edu

AdversarialSearch(Minimax)

▪ Deterministic,zero-sumgames:

▪ Tic-tac-toe,chess,checkers▪ Oneplayermaximizesresult▪ Theotherminimizesresult

▪ Minimaxsearch:

▪ Astate-spacesearchtree▪ Playersalternateturns▪ Computeeachnode’sminimaxvalue:thebestachievableutilityagainstarational(optimal)adversary

8 2 5 6

max

min2 5

5

Terminalvalues:partofthegame

Minimaxvalues:computedrecursively

Page 15: Adversarial Search - cs.utexas.edu

defmax-value(state):initializev=-∞ foreachsuccessorofstate:

v=max(v,min-value(successor))returnv

MinimaxImplementation

defmin-value(state):initializev=+∞ foreachsuccessorofstate:

v=min(v,max-value(successor))returnv

Page 16: Adversarial Search - cs.utexas.edu

MinimaxImplementation(Dispatch)

defvalue(state):ifthestateisaterminalstate:returnthestate’sutilityifthenextagentisMAX:returnmax-value(state)ifthenextagentisMIN:returnmin-value(state)

defmin-value(state):initializev=+∞ foreachsuccessorofstate:

v=min(v,value(successor))returnv

defmax-value(state):initializev=-∞ foreachsuccessorofstate:

v=max(v,value(successor))returnv

Page 17: Adversarial Search - cs.utexas.edu

MinimaxExample

12 8 5 23 2 144 6

Page 18: Adversarial Search - cs.utexas.edu

MinimaxEfficiency

▪ Howefficientisminimax?▪ Justlike(exhaustive)DFS▪ Time:O(bm)▪ Space:O(bm)

▪ Example:Forchess,b≈ 35,m≈ 100▪ Exactsolutioniscompletelyinfeasible▪ But,doweneedtoexplorethewhole

tree?

Page 19: Adversarial Search - cs.utexas.edu

MinimaxProperties

Optimalagainstaperfectplayer.Otherwise?

10 10 9 100

max

min

Page 20: Adversarial Search - cs.utexas.edu

MinimaxvsExpectimax(Min)

End your misery!

Page 21: Adversarial Search - cs.utexas.edu

MinimaxvsExpectimax(Exp)

Hold on to hope, Pacman!

Page 22: Adversarial Search - cs.utexas.edu

ResourceLimits

Page 23: Adversarial Search - cs.utexas.edu

ResourceLimits

▪ Problem:Inrealisticgames,cannotsearchtoleaves!

▪ Solution:Depth-limitedsearch▪ Instead,searchonlytoalimiteddepthinthetree▪ Replaceterminalutilitieswithanevaluationfunctionfornon-terminal

positions

▪ Example:▪ Supposewehave100seconds,canexplore10Knodes/sec▪ Socancheck1Mnodespermove▪ α-β reachesaboutdepth8–decentchessprogram

▪ Guaranteeofoptimalplayisgone

▪ MorepliesmakesaBIGdifference

▪ Useiterativedeepeningforananytimealgorithm? ? ? ?

-1 -2 4 9

4

min

max

-2 4

Page 24: Adversarial Search - cs.utexas.edu

DepthMatters

▪ Evaluationfunctionsarealwaysimperfect

▪ Thedeeperinthetreetheevaluationfunctionisburied,thelessthequalityoftheevaluationfunctionmatters

▪ Animportantexampleofthetradeoffbetweencomplexityoffeaturesandcomplexityofcomputation

Page 25: Adversarial Search - cs.utexas.edu

VideoofDemoLimitedDepth(2)

Page 26: Adversarial Search - cs.utexas.edu

VideoofDemoLimitedDepth(10)

Page 27: Adversarial Search - cs.utexas.edu

EvaluationFunctions

Page 28: Adversarial Search - cs.utexas.edu

EvaluationFunctions

▪ Evaluationfunctionsscorenon-terminalsindepth-limitedsearch

▪ Idealfunction:returnstheactualminimaxvalueoftheposition▪ Inpractice:typicallyweightedlinearsumoffeatures:

▪ e.g.f1(s)=(numwhitequeens–numblackqueens),etc.

Page 29: Adversarial Search - cs.utexas.edu

Thrashing(d=2)

Evaluation function: Score

Page 30: Adversarial Search - cs.utexas.edu

WhyPacmanStarves

▪ Adangerofreplanningagents!▪ Heknowshisscorewillgoupbyeatingthedotnow(left,right)▪ Heknowshisscorewillgoupjustasmuchbyeatingthedotlater(right,right)▪ Therearenopoint-scoringopportunitiesaftereatingthedot(withinthehorizon,twohere)▪ Therefore,waitingseemsjustasgoodaseating:hemaygoeast,thenbackwestinthenextroundofreplanning!

Page 31: Adversarial Search - cs.utexas.edu

Thrashing--Fixed(d=2)

Evaluation function: Score + proximity to nearest dot

Page 32: Adversarial Search - cs.utexas.edu

Smartghosts—implicitcoordination

Evaluation function: proximity to Pacman

Page 33: Adversarial Search - cs.utexas.edu

GameTreePruning

Page 34: Adversarial Search - cs.utexas.edu

MinimaxExample

12 8 5 23 2 144 6

Page 35: Adversarial Search - cs.utexas.edu

MinimaxPruning

12 8 5 23 2 14

Page 36: Adversarial Search - cs.utexas.edu

Alpha-BetaPruning

▪ Generalconfiguration(MINversion)▪ We’recomputingtheMIN-VALUEatsomenoden▪ We’reloopingovern’schildren▪ n’sestimateofthechildrens’minisdropping▪ Whocaresaboutn’svalue?MAX▪ LetabethebestvaluethatMAXcangetatanychoicepoint

alongthecurrentpathfromtheroot▪ Ifnbecomesworsethana,MAXwillavoidit,sowecanstop

consideringn’sotherchildren(it’salreadybadenoughthatitwon’tbeplayed)

▪ MAXversionissymmetric

MAX

MIN

MAX

MIN

a

n

Page 37: Adversarial Search - cs.utexas.edu

Alpha-BetaImplementation

defmin-value(state,α,β):initializev=+∞ foreachsuccessorofstate:

v=min(v,value(successor,α,β))ifv≤αreturnvβ=min(β,v)

returnv

defmax-value(state,α,β):initializev=-∞ foreachsuccessorofstate:

v=max(v,value(successor,α,β))ifv≥βreturnvα=max(α,v)

returnv

α:MAX’sbestoptiononpathtorootβ:MIN’sbestoptiononpathtoroot

Page 38: Adversarial Search - cs.utexas.edu

Alpha-BetaPruningProperties

▪ Thispruninghasnoeffectonminimaxvaluecomputedfortheroot!

▪ Valuesofintermediatenodesmightbewrong▪ Important:childrenoftherootmayhavethewrongvalue▪ Sothemostnaïveversionwon’tletyoudoactionselection

▪ Goodchildorderingimproveseffectivenessofpruning

▪ With“perfectordering”:▪ TimecomplexitydropstoO(bm/2)▪ Doublessolvabledepth!▪ Fullsearchof,e.g.chess,isstillhopeless…

▪ Thisisasimpleexampleofmetareasoning(computingaboutwhattocompute)

10 10 0

max

min

Page 39: Adversarial Search - cs.utexas.edu

Alpha-BetaQuiz

8 4

8

Page 40: Adversarial Search - cs.utexas.edu

Alpha-BetaQuiz2

10 100

10

2

2

10

Page 41: Adversarial Search - cs.utexas.edu

NextTime:Uncertainty!