adversarial search - school of computer sciencedsuter/harbin_course/adversarialsearch.pdf ·...

41
Ar#ficial Intelligence Adversarial Search Instructors: David Suter and Qince Li Course Delivered @ Harbin Ins#tute of Technology [Many slides adapted from those created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. Some others from colleagues at Adelaide University.]

Upload: phamhuong

Post on 14-Apr-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

Ar#ficialIntelligenceAdversarialSearch

Instructors:DavidSuterandQinceLi

CourseDelivered@HarbinIns#tuteofTechnology[ManyslidesadaptedfromthosecreatedbyDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.SomeothersfromcolleaguesatAdelaide

University.]

GamePlayingState-of-the-Art§  Checkers:1950:Firstcomputerplayer.1994:

Firstcomputerchampion:Chinookended40-year-reignofhumanchampionMarionTinsleyusingcomplete8-pieceendgame.2007:Checkerssolved!

§  Chess:1997:DeepBluedefeatshumanchampionGaryKasparovinasix-gamematch.DeepBlueexamined200Mposi#onspersecond,usedverysophis#catedevalua#onandundisclosedmethodsforextendingsomelinesofsearchupto40ply.CurrentprogramsareevenbeZer,iflesshistoric.

§  Go:Humanchampionsarenowstar#ngtobechallengedbymachines,thoughthebesthumanss#llbeatthebestmachines.Ingo,b>300!ClassicprogramsusepaZernknowledgebases,butbigrecentadvancesuseMonteCarlo(randomized)expansionmethods.

Now“solved”withassistanceofDeepLearning

§  Pacman

BehaviorfromComputa#on

[Demo:mysterypacman(L6D1)]

AdversarialGames

§  Manydifferentkindsofgames!

§  Axes:§  Determinis#corstochas#c?§  One,two,ormoreplayers?§  Zerosum?§  Perfectinforma#on(canyouseethestate)?

§  Wantalgorithmsforcalcula#ngastrategy(policy)whichrecommendsamovefromeachstate

TypesofGames

Determinis#cGames

§  Manypossibleformaliza#ons,oneis:§  States:S(startats0)§  Players:P={1...N}(usuallytaketurns)§  Ac#ons:A(maydependonplayer/state)§  Transi#onFunc#on:SxA→S§  TerminalTest:S→{t,f}§  TerminalU#li#es:SxP→R

§  Solu#onforaplayerisapolicy:S→A

Zero-SumGames

§  Zero-SumGames§  Agentshaveoppositeu#li#es

(valuesonoutcomes)§  Letsusthinkofasinglevaluethat

onemaximizesandtheotherminimizes

§  Adversarial,purecompe##on

§  GeneralGames§  Agentshaveindependentu#li#es

(valuesonoutcomes)§  Coopera#on,indifference,

compe##on,andmoreareallpossible

AdversarialSearch

Single-AgentTrees

8

2 0 2 6 4 6… …

ValueofaState

Non-TerminalStates:

8

2 0 2 6 4 6… … TerminalStates:

Valueofastate:Thebestachievableoutcome

(u#lity)fromthatstate

AdversarialGameTrees

-20 -8 -18 -5 -10 +4… … -20 +8

MinimaxValues

+8+4-5-8

StatesUnderAgent’sControl:

TerminalStates:

StatesUnderOpponent’sControl:

Tic-Tac-ToeGameTree

AdversarialSearch(Minimax)

§  Determinis#c,zero-sumgames:§  Tic-tac-toe,chess,checkers§  Oneplayermaximizesresult§  Theotherminimizesresult

§  Minimaxsearch:§  Astate-spacesearchtree§  Playersalternateturns§  Computeeachnode’sminimax

value:thebestachievableu#lityagainstara#onal(op#mal)adversary

8 2 5 6

max

min2 5

5

Terminalvalues:partofthegame

Minimaxvalues:computedrecursively

MinimaxImplementa#on

defmin-value(state):ini#alizev=+∞ foreachsuccessorofstate:

v=min(v,max-value(successor))

returnv

defmax-value(state):ini#alizev=-∞ foreachsuccessorofstate:

v=max(v,min-value(successor))

returnv

MinimaxImplementa#on(Dispatch)

defvalue(state):ifthestateisaterminalstate:returnthestate’su#lity

ifthenextagentisMAX:returnmax-value(state)ifthenextagentisMIN:returnmin-value(state)

defmin-value(state):ini#alizev=+∞ foreachsuccessorofstate:

v=min(v,value(successor))

returnv

defmax-value(state):ini#alizev=-∞ foreachsuccessorofstate:

v=max(v,value(successor))

returnv

MinimaxExample

12 8 5 2 3 2 14 4 6

MinimaxEfficiency

§  Howefficientisminimax?§  Justlike(exhaus#ve)DFS§  Time:O(bm)§  Space:O(bm)

§  Example:Forchess,b≈35,m≈100§  Exactsolu#oniscompletely

infeasible§  But,doweneedtoexplore

thewholetree?

MinimaxProper#es

Op#malagainstaperfectplayer.Otherwise?

10 10 9 100

max

min

[Demo:minvsexp(L6D2,L6D3)]

VideoofDemoMinvs.Exp(Min)

VideoofDemoMinvs.Exp(Exp)

ResourceLimits

ResourceLimits

§  Problem:Inrealis#cgames,cannotsearchtoleaves!

§  Solu#on:Depth-limitedsearch§  Instead,searchonlytoalimiteddepthinthe

tree§  Replaceterminalu#li#eswithanevalua#on

func#onfornon-terminalposi#ons

§  Example:§  Supposewehave100seconds,canexplore

10Knodes/sec§  Socancheck1Mnodespermove§  α-βreachesaboutdepth8–decentchess

program

§  Guaranteeofop#malplayisgone

§  MorepliesmakesaBIGdifference

§  Useitera#vedeepeningforanany#mealgorithm

? ? ? ?

-1 -2 4 9

4

min

max

-2 4

DepthMaZers

§  Evalua#onfunc#onsarealwaysimperfect

§  Thedeeperinthetreetheevalua#onfunc#onisburied,thelessthequalityoftheevalua#onfunc#onmaZers

§  Animportantexampleofthetradeoffbetweencomplexityoffeaturesandcomplexityofcomputa#on

[Demo:depthlimited(L6D4,L6D5)]

VideoofDemoLimitedDepth(2)

VideoofDemoLimitedDepth(10)

Evalua#onFunc#ons

Evalua#onFunc#ons§  Evalua#onfunc#onsscorenon-terminalsindepth-limitedsearch

§  Idealfunc#on:returnstheactualminimaxvalueoftheposi#on§  Inprac#ce:typicallyweightedlinearsumoffeatures:

§  e.g.f1(s)=(numwhitequeens–numblackqueens),etc.

Evalua#onforPacman

[Demo:thrashingd=2,thrashingd=2(fixedevalua#onfunc#on),smartghostscoordinate(L6D6,7,8,10)]

VideoofDemoThrashing(d=2)

WhyPacmanStarves

§  Adangerofreplanningagents!§  Heknowshisscorewillgoupbyea#ngthedotnow(west,east)§  Heknowshisscorewillgoupjustasmuchbyea#ngthedotlater(east,west)§  Therearenopoint-scoringopportuni#esazerea#ngthedot(withinthehorizon,

twohere)§  Therefore,wai#ngseemsjustasgoodasea#ng:hemaygoeast,thenbackwest

inthenextroundofreplanning!

VideoofDemoThrashing--Fixed(d=2)

VideoofDemoSmartGhosts(Coordina#on)

VideoofDemoSmartGhosts(Coordina#on)–ZoomedIn

GameTreePruning

MinimaxExample

12 8 5 2 3 2 14 4 6

MinimaxPruning

12 8 5 2 3 2 14

Alpha-BetaPruning

§  Generalconfigura#on(MINversion)§  We’recompu#ngtheMIN-VALUEatsome

noden

§  We’reloopingovern’schildren§  n’ses#mateofthechildrens’minisdropping§  Whocaresaboutn’svalue?MAX§  LetabethebestvaluethatMAXcangetat

anychoicepointalongthecurrentpathfromtheroot

§  Ifnbecomesworsethana,MAXwillavoidit,sowecanstopconsideringn’sotherchildren(it’salreadybadenoughthatitwon’tbeplayed)

§  MAXversionissymmetric

MAX

MIN

MAX

MIN

a

n

Alpha-BetaImplementa#on

defmin-value(state,α,β):ini#alizev=+∞ foreachsuccessorofstate:

v=min(v,value(successor,α,β))

ifv≤αreturnvβ=min(β,v)

returnv

defmax-value(state,α,β):ini#alizev=-∞ foreachsuccessorofstate:

v=max(v,value(successor,α,β))

ifv≥βreturnvα=max(α,v)

returnv

α:MAX’sbestop#ononpathtorootβ:MIN’sbestop#ononpathtoroot

40

Alpha-BetaPruningProper#es

§  Thispruninghasnoeffectonminimaxvaluecomputedfortheroot!

§  Goodchildorderingimproveseffec#venessofpruning

§  With“perfectordering”:§  TimecomplexitydropstoO(bm/2)§  Doublessolvabledepth!§  Fullsearchof,e.g.chess,iss#llhopeless…

§  Thisisasimpleexampleofmetareasoning(compu#ngaboutwhattocompute)

10 10 0

max

min