lecture 3 - decision making

Making Decisions in Games

1

Theory of Real Games

•We’ve been talking about “games” as single

instances of choice - heads/tails, odds/evens etc.

•We’ve talked about how we can repeat the game

(iterating) and interesting things happen.

• Are most games the same choice repeatedly?

2

Real Games

• At a much less abstract level, a game is not one

choice repeated.

• A sequence of different choices.

• Delayed reward

3

Delayed Reward

• Last week we could see the payoffs for each choice

pair in the games.

• Does a single move in chess have a “reward”?

• The reward is whether the game is won or lost -

the combined result of the choice sequence

4

Evaluating Delayed Rewards

•We need to evaluate what the expected payoff of a

given choice is.

• Typically we can only do this at the end of the game.

• How can we decide what to do now if we won’t

know if it was a good decision until later?

5

Chess

•Opening move is one choice.

•Opponent makes their move.

• You reply.

• Note that your 2nd move is a totally different

theoretical “game” to the first move.

6

Chess

• Initially there are 20 opening moves

• Your opponent has 20 responding moves

• 2 moves in, the size of the potential statespace is

400 states.

• The game gets more complicated later

‣ Average number moves per turn : 35

‣ Average game length : 80

• State space size (Shannon's number) : 3123 - HUGE 7

Search

• This state space is way too big for an exhaustive

search approach like mini-max

• Any brute force approach is not going to work

•We need some mechanism to guide the search

towards areas of the game tree that are useful

8

Heuristics

• A heuristic is formally a “strategy using readily

accessible, though loosely applicable, information to

control problem solving in human beings and

machines”

• Less formally, it’s a guess-timate of the value of a

state, typically based on the distance to the goal

(planning) or likelihood of winning (games)

9

Using Heuristics

• Heuristics guide search across spaces that are too

complex to fully enumerate.

• Estimate potential of the next set of states using the

heuristic and go with the best looking one.

• Can be combined with a search strategy like Best

First Search or Enforced Hill Climbing

10

Heuristic Example - A*

• A* search for path planning is a great example of

heuristics in use.

• In a world of tiles, find an optimal path from A to B.

• A* uses two metric :

‣ Concrete metric of the work to get to a location (g)

‣ Estimate of work to get from location to goal (h)

• Search strategy always chooses location that

minimises (h+g)11


12


13

B

A


14

B

A


15

B

1 + 7 A

1 +7


16

B

1 + 7 A

1 +7


17

B

2 + 6

1 + 7 A

2 + 8 1 +7


18

B

2 + 6

1 + 7 A

2 + 8 1 +7


19

B

3 + 5

2 + 6

1 + 7 A

2 + 8 1 +7


20

B

3 + 5

2 + 6

1 + 7 A

2 + 8 1 +7


21

4 + 4 B

3 + 5 4 + 4

2 + 6

1 + 7 A

2 + 8 1 +7


22

4 + 4 5 + 3 B

3 + 5 4 + 4 5 + 3

2 + 6

1 + 7 A

2 + 8 1 +7


23

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6

1 + 7 A

2 + 8 1 +7

6 + 4


24

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6

1 + 7 A

2 + 8 1 +7

6 + 4


25

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6

1 + 7 A

2 + 8 1 +7 2 + 6

6 + 4


26

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6

1 + 7 A

2 + 8 1 +7 2 + 6

6 + 4


27

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6

1 + 7 A

2 + 8 1 +7 2 + 6 3 + 5

6 + 4


28

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6

1 + 7 A 4 + 4

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4


29

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6

1 + 7 A 4 + 4

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4


30

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6 5 + 3

1 + 7 A 4 + 4 5 + 3

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4


31

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6 5 + 3

1 + 7 A 4 + 4 5 + 3

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4


32

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6 5 + 3 6 + 2

1 + 7 A 4 + 4 5 + 3

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4


33

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3

2 + 6 5 + 3 6 + 2

1 + 7 A 4 + 4 5 + 3

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4


34

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3 7 + 1

2 + 6 5 + 3 6 + 2

1 + 7 A 4 + 4 5 + 3

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4


35

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3 7 + 1

2 + 6 5 + 3 6 + 2

1 + 7 A 4 + 4 5 + 3

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4


36

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3 7 + 1

2 + 6 5 + 3 6 + 2

1 + 7 A 4 + 4 5 + 3

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4


37

4 + 4 5 + 3 6 + 2 B

3 + 5 4 + 4 5 + 3 7 + 1

2 + 6 5 + 3 6 + 2

1 + 7 A 4 + 4 5 + 3

2 + 8 1 +7 2 + 6 3 + 5 4 + 4

6 + 4

Heuristics

• Heuristics can guide our search

• Help us understand what states are bringing us

closer to our goals

• Allow us to backtrack when a promising route

becomes problematic

• Do they work well for games?

38

The Maths of Choice

• Common (basic) Combinatorics problem:

‣ How many X element sub-sets can I make from this set of Y

elements.

• Less formally :

‣ How many different ways can I pick Y things from X things

39

Choice

• We can refer to this as “Choosing”

• “I have 5 things, I choose 2”

• We can write it as : 5 C 2

40

Binomials

• Mathematically, n C k is equivalent to the binomial coefficient

• This can be re-written as

‣ ( nk / k! )

41

Permutations

• The choose operator tells you how many sets there are with

unique elements.

• What if the order that the elements are in matters?

• For this we use Permutation

‣ n P k

• Equivalent to :

‣ n! / (n - k)!

42

Poker

• Card game.

• Typically involves gambling.

• “Poker” is technically an entire set of different games that

share similar structure.

• For the purposes of this lecture, Poker refers specifically to

“Limit Texas Hold ‘Em”

43

Texas Hold ‘Em

• Variant of poker created in 1900’s

• Typically 2-10 player games

• Popular recently - Poker on TV and online is typically Texas

Hold ‘Em

• Aim is to make best hand 5 card hand possible using any of

two private “hole” cards and 5 public “community” cards

44

Phases of the Game

• The game is broken into four phases.

• Initial or “Pre-flop” - Hole cards are dealt and a round of

betting occurs.

• Flop - The first three community cards are dealt, another

round of betting.

• Turn - A fourth community card is dealt, and a round of

betting

• River - Final community card dealt, final betting

45

Some Terminology

• Raise - Increase the bet amount

• Fold - Give up on this game, losing any money already bet

• Call - Put in an amount of money to equal what others are

wagering

• Blinds - An initial mandatory wager by two players. Small and

Big. Players responsible for the blind rotates each game.

46

Poker in Research

• Poker has been a major research area for AI for many years.

• Characteristics in common with many real world problems

‣ Hidden information

‣ Bluffing

‣ Loss minimisation

47

Poker at SAIG

• Major research area for us for many years

• Under my supervision for the last 2 years as

honours projects and Summer internships.

• Much of what you’re going to hear about this week

is based on current research happening right now at

SAIG

48

Strathclyde Poker Research Environment

• SPREE was developed to overcome two challenges we face.

‣ Training data sets obtained from online casinos are

imperfect information. This leads to bad machine learning

‣ Every research project wasted significant time re-

implementing a framework for Poker

• SPREE is open source client/server implementation

in Java, with AI-based client and GUI client.

• http://sourceforge.net/projects/spree-poker

49

http://sourceforge.net/projects/spree-poker

http://sourceforge.net/projects/spree-poker

Limit or No Limit?

• Two types of game - Limit and No Limit

• No Limit - Classical movie Poker.

‣ Raises can be any amount

‣ Any number of raises

• Limit - Common rule set

‣ Raises are a single fixed amount

‣ Limited number of raises allowed per round

50

Limit or No Limit?

• Focus on Limit

• Significantly reduces complexity of the problem.

• Also means we can focus on the game, rather than the

psychological aspects.

51

Poker State Space

• At each point, each player has typically 3 options

‣ Raise, Call, Fold

• We can approximate the size of the search space at point k

as 3k

• We can also determine lower and upper bounds for k since

in Limit there are a fixed number of raises.

52

Dealing Cards

• For a game of N players, 2N + 5 cards are required.

• There are 52 C (2N + 5) different sets of cards that could be

dealt.

• But who gets which card is important, so we need to use

Permutation not Choose

• 52 P (2N+5)

‣ For a standard 10 player game - 5.86 x1024

53

Length of a Poker Game(Lower)

• In the shortest game possible, all players fold.

• The last player (who put in the Big Blind) wins by default

• N-1 choices to reach this point

• 2N cards are required

• 3(N-1) * 52 P 2N

• For a standard 10 player game :

‣ 19683 * 3x1032 = 6x1036

54

Length of a Poker Game(Upper)

• In the longest game possible

• All players initially call, final player to call instead raises.

• 4N-4 turns per round, 4 rounds = 16N-16 turns total

• 2N + 5 cards required

• 3(16N-6) * 52 P 2N + 5

• Again for a 10 player game

‣ 5x1068 * 7.4x1039 = 3.7x10108

55

Total State Space Size

• The total state space is smaller than Shannon’s number

• Still completely unwieldy for any kind of exhaustive search

• Note that we’ve considered the lower and upper bounds of

the state space.

• Actual values will typically fall somewhere between.

• Also note that the upper bound hinges on the restrictions

imposted by Limit, and we don’t need to consider any state

complexity variable raise size would introduce.

56

Abstraction

• There are some things we can do to trim this down (a bit)

• Firstly, we can simplify our view of the starting position

• We don’t need to consider all possible cards that could be

dealt

‣ Cards that will help us change the situation

‣ Cards that don’t help us can be grouped together

57

Starting Hands

• There are 52 C 2 = 1,326 potential opening hands

• But we can reduce this

‣ Suit doesn’t matter except for matching

‣We can reduce it to 2 card “suited” or “unsuited”

‣ 2c, 7d is equivalent to 2d, 7c or 2s, 7h

• This gives a total number of abstract hands as

‣ 13 (pairs) + 13 C 2 (suited) + 13 C 2 (unsuited) = 169

•We’ll see tomorrow there are more abstractions.

58

Heuristics for Poker

• “Every hand’s a winner and every hand’s a loser”

• Heuristics for Poker are tricky because of this.

• Analysis is largely based on your own hand - if my hand at a

point is such-and-such a type or better, it is worth playing

• Kind of naive

59

“Expert” Poker Systems

• You can make a somewhat capable agent by combining a

bunch of these naive heuristics.

• It’s known which of the starting hands are strong and which

are weak.

• You can make a guess as to what you should do based on

your hand strength.

‣ This is not massively informed

• Basic, functional approach, attempts to lift out general rules

that will lead to good results.

60

Evaluating DelayedReward in Poker

• I’ve mentioned delayed reward a few times

• How does this fit into Poker?

• We know that the strength of our hand alone won’t

decide the game.

• We know that opponents can bluff about their hand

strength.

• Need to find out “what happens if” for possible

actions61

Monte Carlo Tree Search

• Initially used without formally defining it by Buffon and Fermi

(among others)

• Developed at Los Alamos by our Game Theory friend John

Von Neumann

• For a large enough sample size, random sampling can often

take the place of exhaustive enumeration

62

Samples and Probes

• When we say a “random sample” we want to sample

the potential outcomes

‣ And find the potential rewards

• The leaf nodes of the game tree have the final value

of the game.

• By randomly walking from the current node to leaf

nodes, we can build up a picture of where our

actions might lead us.63

Exploration vs Exploitation

• We can sample at random, and we'll get coverage in all areas

• Some areas are more promising than others

• We want to "exploit" these areas and inspect them closely

‣ Ensure that they are as good as they look

• At the same time, we want to keep "exploring" in case there

are better areas in the game tree.

• Balancing these two contradictory goals falls to the UCT

heuristic.

64

Reward Evaluation

• We can use the Monte Carlo samples to simulate down to

the end of the game.

• Establish whether we win or lose (and how much).

• Bubble this value back up the tree.

• Build a picture of the amount we can expect to win based on

the actions we are considering this turn.

65

Caveat Emptor

•What we’ve seen today is just ONE approach to

tackling Poker.

• It’s an open challenge in AI to find a good solution

• The techniques used are important

• More important is the reasoning for using these

approaches.

• AI as a toolkit, not a definitive solution.

66

Sampling the StPetersburg Paradox

67

12481632641282565121024

2,147,483,647

834,532,607

435,781,603

222,566,052

108,347,756

54,225,257

27,184,330

13,605,016

6,792,164

3,393,086

1,698,228

Sampling the StPetersburg Paradox

68

• If we repeatedly play out the St Petersburg game we

see that it behaves much as we expect

• Half the games end immediately, a quarter after 1

turn and so on.

• 4,000,000,000 samples, the average is only £14.50

•Where the Expected Value metric didn't inform our

decision making, we can use sampling to see what

actually happens!

Summary

• Understanding real games

• Delayed reward systems

• Poker

• Monte Carlo with UCT (in brief)

69

Next Lecture

• More on Monte Carlo

• Describing a player mathematically

• Categorising players into types

• Using this classification for better decisions

70

lecture 3 - decision making

Technology

heuristic example

great example ofheuristics

bestfirst search

search help

given choice

choice sequence4

choice pair

choice repeated