lecture 3 - decision making
DESCRIPTION
This is the 3rd of an 8 lecture series that I presented at University of Strathclyde in 2011/2012 as part of the final year AI course. This lecture moves beyond the Game Theoretic definition of a game, and demonstrates how algorithms can be used not only to find a single good choice, but a sequence of choices that will eventually reach a winning state.TRANSCRIPT
Making Decisions in Games
1
Theory of Real Games
•We’ve been talking about “games” as single
instances of choice - heads/tails, odds/evens etc.
•We’ve talked about how we can repeat the game
(iterating) and interesting things happen.
• Are most games the same choice repeatedly?
2
Real Games
• At a much less abstract level, a game is not one
choice repeated.
• A sequence of different choices.
• Delayed reward
3
Delayed Reward
• Last week we could see the payoffs for each choice
pair in the games.
• Does a single move in chess have a “reward”?
• The reward is whether the game is won or lost -
the combined result of the choice sequence
4
Evaluating Delayed Rewards
•We need to evaluate what the expected payoff of a
given choice is.
• Typically we can only do this at the end of the game.
• How can we decide what to do now if we won’t
know if it was a good decision until later?
5
Chess
•Opening move is one choice.
•Opponent makes their move.
• You reply.
• Note that your 2nd move is a totally different
theoretical “game” to the first move.
6
Chess
• Initially there are 20 opening moves
• Your opponent has 20 responding moves
• 2 moves in, the size of the potential statespace is
400 states.
• The game gets more complicated later
‣ Average number moves per turn : 35
‣ Average game length : 80
• State space size (Shannon's number) : 3123 - HUGE 7
Search
• This state space is way too big for an exhaustive
search approach like mini-max
• Any brute force approach is not going to work
•We need some mechanism to guide the search
towards areas of the game tree that are useful
8
Heuristics
• A heuristic is formally a “strategy using readily
accessible, though loosely applicable, information to
control problem solving in human beings and
machines”
• Less formally, it’s a guess-timate of the value of a
state, typically based on the distance to the goal
(planning) or likelihood of winning (games)
9
Using Heuristics
• Heuristics guide search across spaces that are too
complex to fully enumerate.
• Estimate potential of the next set of states using the
heuristic and go with the best looking one.
• Can be combined with a search strategy like Best
First Search or Enforced Hill Climbing
10
Heuristic Example - A*
• A* search for path planning is a great example of
heuristics in use.
• In a world of tiles, find an optimal path from A to B.
• A* uses two metric :
‣ Concrete metric of the work to get to a location (g)
‣ Estimate of work to get from location to goal (h)
• Search strategy always chooses location that
minimises (h+g)11
Heuristic Example - A*
12
Heuristic Example - A*
13
B
A
Heuristic Example - A*
14
B
A
Heuristic Example - A*
15
B
1 + 7 A
1 +7
Heuristic Example - A*
16
B
1 + 7 A
1 +7
Heuristic Example - A*
17
B
2 + 6
1 + 7 A
2 + 8 1 +7
Heuristic Example - A*
18
B
2 + 6
1 + 7 A
2 + 8 1 +7
Heuristic Example - A*
19
B
3 + 5
2 + 6
1 + 7 A
2 + 8 1 +7
Heuristic Example - A*
20
B
3 + 5
2 + 6
1 + 7 A
2 + 8 1 +7
Heuristic Example - A*
21
4 + 4 B
3 + 5 4 + 4
2 + 6
1 + 7 A
2 + 8 1 +7
Heuristic Example - A*
22
4 + 4 5 + 3 B
3 + 5 4 + 4 5 + 3
2 + 6
1 + 7 A
2 + 8 1 +7
Heuristic Example - A*
23
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6
1 + 7 A
2 + 8 1 +7
6 + 4
Heuristic Example - A*
24
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6
1 + 7 A
2 + 8 1 +7
6 + 4
Heuristic Example - A*
25
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6
1 + 7 A
2 + 8 1 +7 2 + 6
6 + 4
Heuristic Example - A*
26
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6
1 + 7 A
2 + 8 1 +7 2 + 6
6 + 4
Heuristic Example - A*
27
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6
1 + 7 A
2 + 8 1 +7 2 + 6 3 + 5
6 + 4
Heuristic Example - A*
28
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6
1 + 7 A 4 + 4
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
Heuristic Example - A*
29
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6
1 + 7 A 4 + 4
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
Heuristic Example - A*
30
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6 5 + 3
1 + 7 A 4 + 4 5 + 3
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
Heuristic Example - A*
31
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6 5 + 3
1 + 7 A 4 + 4 5 + 3
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
Heuristic Example - A*
32
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6 5 + 3 6 + 2
1 + 7 A 4 + 4 5 + 3
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
Heuristic Example - A*
33
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6 5 + 3 6 + 2
1 + 7 A 4 + 4 5 + 3
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
Heuristic Example - A*
34
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3 7 + 1
2 + 6 5 + 3 6 + 2
1 + 7 A 4 + 4 5 + 3
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
Heuristic Example - A*
35
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3 7 + 1
2 + 6 5 + 3 6 + 2
1 + 7 A 4 + 4 5 + 3
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
Heuristic Example - A*
36
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3 7 + 1
2 + 6 5 + 3 6 + 2
1 + 7 A 4 + 4 5 + 3
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
Heuristic Example - A*
37
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3 7 + 1
2 + 6 5 + 3 6 + 2
1 + 7 A 4 + 4 5 + 3
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
Heuristics
• Heuristics can guide our search
• Help us understand what states are bringing us
closer to our goals
• Allow us to backtrack when a promising route
becomes problematic
• Do they work well for games?
38
The Maths of Choice
• Common (basic) Combinatorics problem:
‣ How many X element sub-sets can I make from this set of Y
elements.
• Less formally :
‣ How many different ways can I pick Y things from X things
39
Choice
• We can refer to this as “Choosing”
• “I have 5 things, I choose 2”
• We can write it as : 5 C 2
40
Binomials
• Mathematically, n C k is equivalent to the binomial coefficient
• This can be re-written as
‣ ( nk / k! )
41
Permutations
• The choose operator tells you how many sets there are with
unique elements.
• What if the order that the elements are in matters?
• For this we use Permutation
‣ n P k
• Equivalent to :
‣ n! / (n - k)!
42
Poker
• Card game.
• Typically involves gambling.
• “Poker” is technically an entire set of different games that
share similar structure.
• For the purposes of this lecture, Poker refers specifically to
“Limit Texas Hold ‘Em”
43
Texas Hold ‘Em
• Variant of poker created in 1900’s
• Typically 2-10 player games
• Popular recently - Poker on TV and online is typically Texas
Hold ‘Em
• Aim is to make best hand 5 card hand possible using any of
two private “hole” cards and 5 public “community” cards
44
Phases of the Game
• The game is broken into four phases.
• Initial or “Pre-flop” - Hole cards are dealt and a round of
betting occurs.
• Flop - The first three community cards are dealt, another
round of betting.
• Turn - A fourth community card is dealt, and a round of
betting
• River - Final community card dealt, final betting
45
Some Terminology
• Raise - Increase the bet amount
• Fold - Give up on this game, losing any money already bet
• Call - Put in an amount of money to equal what others are
wagering
• Blinds - An initial mandatory wager by two players. Small and
Big. Players responsible for the blind rotates each game.
46
Poker in Research
• Poker has been a major research area for AI for many years.
• Characteristics in common with many real world problems
‣ Hidden information
‣ Bluffing
‣ Loss minimisation
47
Poker at SAIG
• Major research area for us for many years
• Under my supervision for the last 2 years as
honours projects and Summer internships.
• Much of what you’re going to hear about this week
is based on current research happening right now at
SAIG
48
Strathclyde Poker Research Environment
• SPREE was developed to overcome two challenges we face.
‣ Training data sets obtained from online casinos are
imperfect information. This leads to bad machine learning
‣ Every research project wasted significant time re-
implementing a framework for Poker
• SPREE is open source client/server implementation
in Java, with AI-based client and GUI client.
• http://sourceforge.net/projects/spree-poker
49
Limit or No Limit?
• Two types of game - Limit and No Limit
• No Limit - Classical movie Poker.
‣ Raises can be any amount
‣ Any number of raises
• Limit - Common rule set
‣ Raises are a single fixed amount
‣ Limited number of raises allowed per round
50
Limit or No Limit?
• Focus on Limit
• Significantly reduces complexity of the problem.
• Also means we can focus on the game, rather than the
psychological aspects.
51
Poker State Space
• At each point, each player has typically 3 options
‣ Raise, Call, Fold
• We can approximate the size of the search space at point k
as 3k
• We can also determine lower and upper bounds for k since
in Limit there are a fixed number of raises.
52
Dealing Cards
• For a game of N players, 2N + 5 cards are required.
• There are 52 C (2N + 5) different sets of cards that could be
dealt.
• But who gets which card is important, so we need to use
Permutation not Choose
• 52 P (2N+5)
‣ For a standard 10 player game - 5.86 x1024
53
Length of a Poker Game(Lower)
• In the shortest game possible, all players fold.
• The last player (who put in the Big Blind) wins by default
• N-1 choices to reach this point
• 2N cards are required
• 3(N-1) * 52 P 2N
• For a standard 10 player game :
‣ 19683 * 3x1032 = 6x1036
54
Length of a Poker Game(Upper)
• In the longest game possible
• All players initially call, final player to call instead raises.
• 4N-4 turns per round, 4 rounds = 16N-16 turns total
• 2N + 5 cards required
• 3(16N-6) * 52 P 2N + 5
• Again for a 10 player game
‣ 5x1068 * 7.4x1039 = 3.7x10108
55
Total State Space Size
• The total state space is smaller than Shannon’s number
• Still completely unwieldy for any kind of exhaustive search
• Note that we’ve considered the lower and upper bounds of
the state space.
• Actual values will typically fall somewhere between.
• Also note that the upper bound hinges on the restrictions
imposted by Limit, and we don’t need to consider any state
complexity variable raise size would introduce.
56
Abstraction
• There are some things we can do to trim this down (a bit)
• Firstly, we can simplify our view of the starting position
• We don’t need to consider all possible cards that could be
dealt
‣ Cards that will help us change the situation
‣ Cards that don’t help us can be grouped together
57
Starting Hands
• There are 52 C 2 = 1,326 potential opening hands
• But we can reduce this
‣ Suit doesn’t matter except for matching
‣We can reduce it to 2 card “suited” or “unsuited”
‣ 2c, 7d is equivalent to 2d, 7c or 2s, 7h
• This gives a total number of abstract hands as
‣ 13 (pairs) + 13 C 2 (suited) + 13 C 2 (unsuited) = 169
•We’ll see tomorrow there are more abstractions.
58
Heuristics for Poker
• “Every hand’s a winner and every hand’s a loser”
• Heuristics for Poker are tricky because of this.
• Analysis is largely based on your own hand - if my hand at a
point is such-and-such a type or better, it is worth playing
• Kind of naive
59
“Expert” Poker Systems
• You can make a somewhat capable agent by combining a
bunch of these naive heuristics.
• It’s known which of the starting hands are strong and which
are weak.
• You can make a guess as to what you should do based on
your hand strength.
‣ This is not massively informed
• Basic, functional approach, attempts to lift out general rules
that will lead to good results.
60
Evaluating DelayedReward in Poker
• I’ve mentioned delayed reward a few times
• How does this fit into Poker?
• We know that the strength of our hand alone won’t
decide the game.
• We know that opponents can bluff about their hand
strength.
• Need to find out “what happens if” for possible
actions61
Monte Carlo Tree Search
• Initially used without formally defining it by Buffon and Fermi
(among others)
• Developed at Los Alamos by our Game Theory friend John
Von Neumann
• For a large enough sample size, random sampling can often
take the place of exhaustive enumeration
62
Samples and Probes
• When we say a “random sample” we want to sample
the potential outcomes
‣ And find the potential rewards
• The leaf nodes of the game tree have the final value
of the game.
• By randomly walking from the current node to leaf
nodes, we can build up a picture of where our
actions might lead us.63
Exploration vs Exploitation
• We can sample at random, and we'll get coverage in all areas
• Some areas are more promising than others
• We want to "exploit" these areas and inspect them closely
‣ Ensure that they are as good as they look
• At the same time, we want to keep "exploring" in case there
are better areas in the game tree.
• Balancing these two contradictory goals falls to the UCT
heuristic.
64
Reward Evaluation
• We can use the Monte Carlo samples to simulate down to
the end of the game.
• Establish whether we win or lose (and how much).
• Bubble this value back up the tree.
• Build a picture of the amount we can expect to win based on
the actions we are considering this turn.
65
Caveat Emptor
•What we’ve seen today is just ONE approach to
tackling Poker.
• It’s an open challenge in AI to find a good solution
• The techniques used are important
• More important is the reasoning for using these
approaches.
• AI as a toolkit, not a definitive solution.
66
Sampling the StPetersburg Paradox
67
12481632641282565121024
2,147,483,647
834,532,607
435,781,603
222,566,052
108,347,756
54,225,257
27,184,330
13,605,016
6,792,164
3,393,086
1,698,228
Sampling the StPetersburg Paradox
68
• If we repeatedly play out the St Petersburg game we
see that it behaves much as we expect
• Half the games end immediately, a quarter after 1
turn and so on.
• 4,000,000,000 samples, the average is only £14.50
•Where the Expected Value metric didn't inform our
decision making, we can use sampling to see what
actually happens!
Summary
• Understanding real games
• Delayed reward systems
• Poker
• Monte Carlo with UCT (in brief)
69
Next Lecture
• More on Monte Carlo
• Describing a player mathematically
• Categorising players into types
• Using this classification for better decisions
70