adversarial search - cs.utexas.edu

CS188:ArtificialIntelligenceAdversarialSearch

Prof.ScottNiekumTheUniversityofTexasatAustin

[TheseslidesarebasedonthoseofDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.AllCS188materialsareavailableathttp://ai.berkeley.edu.]

GamePlayingState-of-the-Art

▪ Checkers:1950:Firstcomputerplayer.1994:Firstcomputerchampion:Chinookended40-year-reignofhumanchampionMarionTinsleyusingcomplete8-pieceendgame.2007:Checkerssolved!

▪ Chess:1997:DeepBluedefeatshumanchampionGaryKasparovinasix-gamematch.DeepBlueexamined200Mpositionspersecond,usedverysophisticatedevaluationandundisclosedmethodsforextendingsomelinesofsearchupto40ply.Currentprogramsareevenbetter,iflesshistoric.

▪ Go:2016:AlphaGo,createdbyGoogleDeepMindbeat9-danprofessionalGoplayerLeeSedol4-1onafullsized19x19board.AlphaGocombinedMonteCarloTreeSearchwithdeepneuralnetworks,improvingviareinforcementlearningthroughself-play.

▪ OpenAIFive(DOTA):gettingclosetoworld-class

Howtoconsiderbehaviorofghosts?

AdversarialGames

▪ Manydifferentkindsofgames!

▪ Axes:▪ Deterministicorstochastic?▪ One,two,ormoreplayers?▪ Zerosum?▪ Perfectinformation(canyouseethestate)?

▪ Wantalgorithmsforcalculatingastrategy(policy)whichrecommendsamovefromeachstate

TypesofGames

DeterministicGames

▪ Manypossibleformalizations,oneis:▪ States:S(startats0)▪ Players:P={1...N}(usuallytaketurns)▪ Actions:A(maydependonplayer/state)▪ TransitionFunction:SxA→ S▪ TerminalTest:S→ {t,f}▪ TerminalUtilities:SxP → R

▪ Solutionforaplayerisapolicy:S→ A

Zero-SumGames

▪ Zero-SumGames▪ Agentshaveoppositeutilities(valuesonoutcomes)▪ Letsusthinkofasinglevaluethatonemaximizesand

theotherminimizes▪ Adversarial,purecompetition

▪ GeneralGames▪ Agentshaveindependentutilities(valueson

outcomes)▪ Cooperation,indifference,competition,and

moreareallpossible▪ Morelateronnon-zero-sumgames

AdversarialSearch

Single-AgentTrees

8

2 0 2 6 4 6… …

ValueofaState

Non-TerminalStates:

8

2 0 2 6 4 6… … TerminalStates:

Valueofastate:Thebestachievableoutcome(utility)fromthatstate

AdversarialGameTrees

-20 -8 -18 -5 -10 +4… … -20 +8

MinimaxValues

+8-10-5-8

StatesUnderAgent’sControl:

TerminalStates:

StatesUnderOpponent’sControl:

Tic-Tac-ToeGameTree

AdversarialSearch(Minimax)

▪ Deterministic,zero-sumgames:

▪ Tic-tac-toe,chess,checkers▪ Oneplayermaximizesresult▪ Theotherminimizesresult

▪ Minimaxsearch:

▪ Astate-spacesearchtree▪ Playersalternateturns▪ Computeeachnode’sminimaxvalue:thebestachievableutilityagainstarational(optimal)adversary

8 2 5 6

max

min2 5

5

Terminalvalues:partofthegame

Minimaxvalues:computedrecursively

defmax-value(state):initializev=-∞ foreachsuccessorofstate:

v=max(v,min-value(successor))returnv

MinimaxImplementation

defmin-value(state):initializev=+∞ foreachsuccessorofstate:

v=min(v,max-value(successor))returnv

MinimaxImplementation(Dispatch)

defvalue(state):ifthestateisaterminalstate:returnthestate’sutilityifthenextagentisMAX:returnmax-value(state)ifthenextagentisMIN:returnmin-value(state)

defmin-value(state):initializev=+∞ foreachsuccessorofstate:

v=min(v,value(successor))returnv

defmax-value(state):initializev=-∞ foreachsuccessorofstate:

v=max(v,value(successor))returnv

MinimaxExample

12 8 5 23 2 144 6

MinimaxEfficiency

▪ Howefficientisminimax?▪ Justlike(exhaustive)DFS▪ Time:O(bm)▪ Space:O(bm)

▪ Example:Forchess,b≈ 35,m≈ 100▪ Exactsolutioniscompletelyinfeasible▪ But,doweneedtoexplorethewhole

tree?

MinimaxProperties

Optimalagainstaperfectplayer.Otherwise?

10 10 9 100

max

min

MinimaxvsExpectimax(Min)

End your misery!

MinimaxvsExpectimax(Exp)

Hold on to hope, Pacman!

ResourceLimits

ResourceLimits

▪ Problem:Inrealisticgames,cannotsearchtoleaves!

▪ Solution:Depth-limitedsearch▪ Instead,searchonlytoalimiteddepthinthetree▪ Replaceterminalutilitieswithanevaluationfunctionfornon-terminal

positions

▪ Example:▪ Supposewehave100seconds,canexplore10Knodes/sec▪ Socancheck1Mnodespermove▪ α-β reachesaboutdepth8–decentchessprogram

▪ Guaranteeofoptimalplayisgone

▪ MorepliesmakesaBIGdifference

▪ Useiterativedeepeningforananytimealgorithm? ? ? ?

-1 -2 4 9

4

min

max

-2 4

DepthMatters

▪ Evaluationfunctionsarealwaysimperfect

▪ Thedeeperinthetreetheevaluationfunctionisburied,thelessthequalityoftheevaluationfunctionmatters

▪ Animportantexampleofthetradeoffbetweencomplexityoffeaturesandcomplexityofcomputation

VideoofDemoLimitedDepth(2)

VideoofDemoLimitedDepth(10)

EvaluationFunctions

EvaluationFunctions

▪ Evaluationfunctionsscorenon-terminalsindepth-limitedsearch

▪ Idealfunction:returnstheactualminimaxvalueoftheposition▪ Inpractice:typicallyweightedlinearsumoffeatures:

▪ e.g.f1(s)=(numwhitequeens–numblackqueens),etc.

Thrashing(d=2)

Evaluation function: Score

WhyPacmanStarves

▪ Adangerofreplanningagents!▪ Heknowshisscorewillgoupbyeatingthedotnow(left,right)▪ Heknowshisscorewillgoupjustasmuchbyeatingthedotlater(right,right)▪ Therearenopoint-scoringopportunitiesaftereatingthedot(withinthehorizon,twohere)▪ Therefore,waitingseemsjustasgoodaseating:hemaygoeast,thenbackwestinthenextroundofreplanning!

Thrashing--Fixed(d=2)

Evaluation function: Score + proximity to nearest dot

Smartghosts—implicitcoordination

Evaluation function: proximity to Pacman

GameTreePruning

MinimaxExample

12 8 5 23 2 144 6

MinimaxPruning

12 8 5 23 2 14

Alpha-BetaPruning

▪ Generalconfiguration(MINversion)▪ We’recomputingtheMIN-VALUEatsomenoden▪ We’reloopingovern’schildren▪ n’sestimateofthechildrens’minisdropping▪ Whocaresaboutn’svalue?MAX▪ LetabethebestvaluethatMAXcangetatanychoicepoint

alongthecurrentpathfromtheroot▪ Ifnbecomesworsethana,MAXwillavoidit,sowecanstop

consideringn’sotherchildren(it’salreadybadenoughthatitwon’tbeplayed)

▪ MAXversionissymmetric

MAX

MIN

MAX

MIN

a

n

Alpha-BetaImplementation

defmin-value(state,α,β):initializev=+∞ foreachsuccessorofstate:

v=min(v,value(successor,α,β))ifv≤αreturnvβ=min(β,v)

returnv

defmax-value(state,α,β):initializev=-∞ foreachsuccessorofstate:

v=max(v,value(successor,α,β))ifv≥βreturnvα=max(α,v)

returnv

α:MAX’sbestoptiononpathtorootβ:MIN’sbestoptiononpathtoroot

Alpha-BetaPruningProperties

▪ Thispruninghasnoeffectonminimaxvaluecomputedfortheroot!

▪ Valuesofintermediatenodesmightbewrong▪ Important:childrenoftherootmayhavethewrongvalue▪ Sothemostnaïveversionwon’tletyoudoactionselection

▪ Goodchildorderingimproveseffectivenessofpruning

▪ With“perfectordering”:▪ TimecomplexitydropstoO(bm/2)▪ Doublessolvabledepth!▪ Fullsearchof,e.g.chess,isstillhopeless…

▪ Thisisasimpleexampleofmetareasoning(computingaboutwhattocompute)

10 10 0

max

min

Alpha-BetaQuiz

8 4

8

Alpha-BetaQuiz2

10 100

10

2

2

10

NextTime:Uncertainty!

adversarial search - cs.utexas.edu

Documents