ai in game (iv) oct. 11, 2006. so far artificial intelligence: a modern approach –stuart russell...

46
AI in game (IV) Oct. 11, 2006

Upload: ellen-flowers

Post on 28-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

AI in game (IV)

Oct. 11, 2006

So far

• Artificial Intelligence: A Modern Approach– Stuart Russell and Peter Norvig– Prentice Hall, 2nd ed.

– Chapter 1: AI taxonomy– Chapter 2: agents– Chapter 3: uninformed search– Chapter 4: informed search

From now on

• Artificial Intelligence: A Modern Approach– Chapter 4. – Chapter 6: adversarial search

• Network part

• Learning (maybe from the same textbook)

• Game AI techniques

Outline

• Ch 4. informed search– Online search

• Ch 6. adversarial search– Optimal decisions– α-β pruning– Imperfect, real-time decisions

Offline search vs. online search

• Offline search agents– Compute a solution before setting foot in the

real world

• Online search agents– Interleave computation and action

• E.g. takes an action and then observes environments and then computes the next action

– Necessary for an exploration problem• States and actions are unknown• E.g. robot in a new building, or labyrinth

online search problems• Agents are assumed to know only

– Actions(s): returns a list of actions allowed in states– c(s,a,s’): this step-cost cannot be used until the agent knows that

s’ is the outcome– Goal-test(s)

• The agent cannot access the successors of a state except by actually trying all the actions in that state

• Assumptions– The agent can recognize a state that it has visited before– Actions are deterministic– Optionally, an admissible heuristic function

online search problems• If some actions are irreversible, the agent may reach a

dead end

• If some goal state is reachable from every reachable state, the state space is “safely explorable”

online search agents• Online algorithm can expand only a node that it physically

occupies– Offline algorithms can expand any node in fringe

• Same principle as DFS

Online DFSfunction ONLINE_DFS-AGENT(s’) return an action

input: s’, a percept identifying current statestatic: result, a table of the next state, indexed by action and state, initially empty unexplored, a stack that lists, for each visited state, the action not yet tried unbacktracked, a stack that lists, for each visited state, the predecessor states

to which the agent has not yet backtracked s, a, the previous state and action, initially null

if GOAL-TEST(s’) then return stopif s’ is a new state then unexplored[s’] ACTIONS(s’)if s is not null then do

result[a,s] s’add s to the front of unbackedtracked[s’]

if unexplored[s’] is empty thenif unbacktracked[s’] is empty then return stopelse a an action b such that result[b, s’]=POP(unbacktracked[s’])

else a POP(unexplored[s’])s s’return a

Online DFS, example• Assume maze problem on 3x3

grid.• s’ = (1,1) is initial state• result, unexplored (UX),

unbacktracked (UB), … are empty

• s, a are also empty

s’

Online DFS, example• GOAL-TEST((1,1))?

– s’ != G thus false

• (1,1) a new state? – True– ACTION((1,1)) → UX[(1,1)]

• {RIGHT,UP}

• s is null?– True (initially)

• UX[(1,1)] empty? – False

• POP(UX[(1,1)]) → a– a=UP

• s = (1,1)• Return a

s’=(1,1)

s’

Online DFS, example

• GOAL-TEST((1,2))?– s’ != G thus false

• (1,2) a new state? – True

– ACTION((1,2)) → UX[(1,2)]• {DOWN}

• s is null?– false (s=(1,1))– result[UP,(1,1)] ← (1,2)– UB[(1,2)]={(1,1)}

• UX[(1,2)] empty? – False

• a=DOWN, s=(1,2)• return a

s’

s’=(1,2)

s

Online DFS, example• GOAL-TEST((1,1))?

– s’ != G thus false

• (1,1) a new state? – false

• s is null?– false (s=(1,2))– result[DOWN,(1,2)] ← (1,1)– UB[(1,1)] = {(1,2)}

• UX[(1,1)] empty? – False

• a=RIGHT, s=(1,1) • return a

s

s’=(1,1)

s’

Online DFS, example• GOAL-TEST((2,1))?

– s’ != G thus false

• (2,1) a new state? – True, UX[(2,1)]={RIGHT,UP,LEFT}

• s is null?– false (s=(1,1))– result[RIGHT,(1,1)] ← (2,1)– UB[(2,1)]={(1,1)}

• UX[(2,1)] empty? – False

• a=LEFT, s=(2,1) • return a

s’

s’=(2,1)

s

Online DFS, example

s’

s’=(1,1)

s

• GOAL-TEST((1,1))?– s’ != G thus false

• (1,1) a new state? – false

• s is null?– false (s=(2,1))– result[LEFT,(2,1)] ← (1,1)– UB[(1,1)]={(2,1),(1,2)}

• UX[(1,1)] empty? – True– UB[(1,1)] empty? False

• a = an action b such that result[b,(1,1)]=(2,1)– b=RIGHT

• a=RIGHT, s=(1,1)• Return a• And so on…

Online DFS• Worst case each node is visited

twice.• An agent can go on a long walk

even when it is close to the solution.

• An online iterative deepening approach solves this problem.

• Online DFS works only when actions are reversible.

Online local search• Hill-climbing is already online

– One state is stored.• Bad performance due to local maxima

– Random restarts impossible.• Solution1: Random walk introduces exploration

– Selects one of actions at random, preference to not-yet-tried action– can produce exponentially many steps

Online local search• Solution 2: Add memory to hill climber

– Store current best estimate H(s) of cost to reach goal– H(s) is initially the heuristic estimate h(s)– Afterward updated with experience (see below)

• Learning real-time A* (LRTA*)

The current position of agent

Learning real-time A*(LRTA*)function LRTA*-COST(s,a,s’,H) return an cost estimate

if s’ is undefined the return h(s)

else return c(s,a,s’) + H[s’]

function LRTA*-AGENT(s’) return an action

input: s’, a percept identifying current state

static: result, a table of next state, indexed by action and state, initially empty

H, a table of cost estimates indexed by state, initially empty

s, a, the previous state and action, initially null

if GOAL-TEST(s’) then return stop

if s’ is a new state (not in H) then H[s’] h(s’)

unless s is null

result[a,s] s’H[s] min LRTA*-COST(s,b,result[b,s],H) b ACTIONS(s)

a an action b in ACTIONS(s’) that minimizes LRTA*-COST(s’,b,result[b,s’],H)

s s’

return a

Outline

• Ch 4. informed search

• Ch 6. adversarial search– Optimal decisions– α-β pruning– Imperfect, real-time decisions

Games vs. search problems• Problem solving agent is not alone any more

– Multiagent, conflict

• Default: deterministic, turn-taking, two-player, zero sum game of perfect information– Perfect info. vs. imperfect, or probability

• "Unpredictable" opponent specifying a move for every possible opponent reply

• Time limits unlikely to find goal, must approximate

••

* Environments with very many agents are best viewed as economies rather than games

Game formalization• Initial state• A successor function

– Returns a list of (move, state) paris

• Terminal test– Terminal states

• Utility function (or objective function)– A numeric value for the terminal states

• Game tree– The state space

Tic-tac-toe: Game tree (2-player, deterministic, turns)

Minimax• Perfect play for deterministic games: optimal strategy• Idea: choose move to position with highest minimax value

= best achievable payoff against best play• E.g., 2-ply game: only two half-moves

••

Minimax algorithm

Problem of minimax search

• Number of games states is exponential to the number of moves.– Solution: Do not examine every node – ==> Alpha-beta pruning

• Remove branches that do not influence final decision

• Revisit example …

Alpha-Beta Example

[-∞, +∞]

[-∞,+∞]

Range of possible values

Do DF-search until first leaf

Alpha-Beta Example (continued)

[-∞,3]

[-∞,+∞]

Alpha-Beta Example (continued)

[-∞,3]

[-∞,+∞]

Alpha-Beta Example (continued)

[3,+∞]

[3,3]

Alpha-Beta Example (continued)

[-∞,2]

[3,+∞]

[3,3]

This node is worse for MAX

Alpha-Beta Example (continued)

[-∞,2]

[3,14]

[3,3] [-∞,14]

,

Alpha-Beta Example (continued)

[−∞,2]

[3,5]

[3,3] [-∞,5]

,

Alpha-Beta Example (continued)

[2,2][−∞,2]

[3,3]

[3,3]

Alpha-Beta Example (continued)

[2,2][-∞,2]

[3,3]

[3,3]

Properties of α-β

• Pruning does not affect final result

• Good move ordering improves effectiveness of pruning

• With "perfect ordering," time complexity = O(bm/2) doubles depth of search

Why is it called α-β?

• α is the value of the best (i.e., highest-value) choice found so far at any choice point along the path for max

• If v is worse than α, max will avoid it prune that branch

• Define β similarly for min•

–•

The α-β pruning algorithm

The α-β pruning algorithm

Resource limits• In reality, imperfect and real-time decisions are

required– Suppose we have 100 secs, explore 104 nodes/sec

106 nodes per move

• Standard approach:– cutoff test:

• e.g., depth limit

– evaluation function • = estimated desirability of position

Evaluation functions

• For chess, typically linear weighted sum of features

Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s)

• e.g., w1 = 9 for queen, w2 = 5 for rook, … wn = 1 for pawn

f1(s) = (number of white queens) – (number of black queens), etc.

Cutting off searchMinimaxCutoff is identical to MinimaxValue except

1. Terminal-Test is replaced by Cutoff-Test2. Utility is replaced by Eval

Does it work in practice?bm = 106, b=35 m ≈ 4

4-ply lookahead is a hopeless chess player!– 4-ply ≈ human novice– 8-ply ≈ typical PC, human master– 12-ply ≈ Deep Blue, Kasparov

1.

Games that include chance

• Backgammon: move all one’s pieces off the board• Branches leading from each chance node denote the possible dice rolls

– Labeled with roll and the probability

chance nodes

Games that include chance

• [1,1], [6,6] chance 1/36, all other chance 1/18 • Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)• Cannot calculate definite minimax value, only expected value

Expected minimax valueEXPECTED-MINIMAX-VALUE(n) =

UTILITY(n) If n is a terminal

maxs successors(n) MINIMAX-VALUE(s) If n is a max node

mins successors(n) MINIMAX-VALUE(s) If n is a max node

s successors(n) P(s) * EXPECTEDMINIMAX(s) If n is a chance node

These equations can be backed-up recursively all the way to the root of the game tree.

Position evaluation with chance nodes

• Left, A1 is best• Right, A2 is best• Outcome of evaluation function (hence the agent behavior) may change

when values are scaled differently.• Behavior is preserved only by a positive linear transformation of EVAL.