the psychological basis of rationality: examples from games and paradoxes richard m. shiffrin

The Psychological Basis of Rationality:

Examples from Gamesand Paradoxes

Richard M. Shiffrin

• Claim:

• Rationality is a cognitive, not an axiomatic, concept, and is defined both individually and socially, in the context of particular problems and decisions

What is rational in the end is determined by a sufficiently large consensus of thinkers judged to be sufficiently clear reasoners

Backward Induction

• Example: Game of Nim20 marbles on table. Players A (first) and B take

turns taking one or two marbles. Player taking last marble(s) wins.

How should A play? Hard to reason— E.g. A takes 1, B takes 2, A takes 2, etc. => Many possibilities.

Use backward induction:

• Start at end: Player facing 1 or 2 wins. But player facing 3 loses. So A should try to see that B faces 3. But the same applies if B faces 6, because whatever B does, A can give B 3. Similarly for 9, and all multiples of 3. Hence if A can give B a multiple of 3, A will win. Hence A takes 2 at start, giving B 18. Each time A gives B the next lower multiple of 3, until A wins.

• This is an example of backward induction.

But backward induction can fail --Centipede Game

• Two players take turns, either ‘stopping’ or ‘playing’. ‘STOP’ ends the game for both players. ‘PLAY’ causes the player to lose $1 (to the bank), the other player to get $10 (from the bank), and the turn moves to the other player. There are 20 turns, if no one stops first.

• Goal is to maximize personal profit, not beat the other player.

• Seems nice– both players gain a lot by playing many turns.

• BUT, assume purely ‘rational’ players. The player with the last turn will surely STOP, because PLAY loses $1, and the game ends. Knowing this, the player on trial 19 will STOP, because playing loses $1, with no gain following. Knowing this, the player on trial 18 will STOP, etc. Hence by backward induction, the player with the first move will STOP, and end the game with both players getting nothing.

• This may seem as irrational to you as it does to me. Both players have a lot to gain by playing as many trials as possible, so how can it be rational to not play at all? Even if one of the players should stop near the end, both would have received much money by then.

• Of course there are problems with the reasoning. The player on trial n STOPS because that player is sure the other will STOP on trial n+1. But the only way the game could have reached trial n is if the other player PLAYED on the preceding trial. Hence the player cannot be sure the other player will STOP on the next trial.

• This helps justify PLAY on early trials, but doesn’t give a way to play when the end approaches.

• So consider a two-trial game. A gets the first turn, B the second. If B gets a turn, the game then ends.

A

STOP A and B get $0

PLAY B

STOP A loses $1, B gains $10

PLAYA gains $9, B gains $9

B will STOP at turn 2, to keep all $10. This will cause A to lose $1, so A will STOP at turn 1, and A and B get $0. Is this rational? Note that both can get $9 if both play.

• Game theory, economic theory, subjective expected utility theory, and many more, all say A should STOP. Is this rational?

• I argue that there is no axiomatic definition of rationality, that one must decide what is rational in the context of a given game.

• When two rational players decide what to do, they must not only decide what to do, but what is ‘rational’ in that game setting.

• Here I suggest it is rational for both players to play. Why? There are only three strategies to consider: {STOP}, {PLAY, STOP}, and {PLAY, PLAY}. But {PLAY, STOP} cannot be the rational strategy, because then A would not PLAY.

• This leaves only {STOP} and {PLAY, PLAY}. {STOP} gives both players $0; {PLAY, PLAY} gives both players $9. Clearly the second is the rational choice.

• Some people can’t accept this. They argue that B, at turn 2, will ‘defect’ and stop. But if this is rational, then A knows it, and A will not play at turn 1. Hence B should PLAY at turn two, if both players are rational.

• I know some of you will not like this reasoning, because ‘defection’ seems very seductive. But we are discussing rational decision making, not emotional feelings.

• The problem with traditional reasoning is the failure to take into account that the two decisions are correlated. Imagine that the correlation was perfect, in the sense that Player 1’s decision to play always occurred with Player 2’s to play (of course if Player 1 stops, Player 2 is irrelevant). In this case everyone would play when Player 1.

• But why should one assume correlation? I will provide a few lines or argument.

• Suppose the players must lay out their irrevocable strategies (not known to the other player) before knowing who will be the first player. The first player will be determined by a later coin flip.

• A could choose to defect on trial 2, if A is the second player, and stop on trial 1, if the first player. Is this rational? If so, it is likely both players will choose this, and both will get $0.

• Note that this reasoning violates an ‘axiom’ of game theory, that one plays on the last trial whatever is best at that point, regardless of how that point is reached. I say this is wrong: In a one trial game, B would of course STOP, to save $1. In a two trial game at the same point B PLAYS, because this allows B to get $9 rather than $0.

• Why does the strategy change? Because what B does affects how A plays. A knows B’s ‘rational’ decision before the first play, just as does B.

• Perhaps you remain unconvinced, so let me give what could be a scenario that is more personally convincing.

• Suppose there are N players, each playing one trial of the centipede game against him or herself. They do this under the drug Midazolam that leaves reasoning intact but prevents memory of what decision had been made at the first decision point. Each player is told to adopt a Player 1 strategy that will maximize the number of times they get a Player 1 payoff better than the other people when they are Player 1. Similarly each player adopts a Player 2 strategy to maximize the number of times they get a Player 2 payoff that is better than the other people when they are Player 2.

• IMPORTANT: When you are Player 2, you must decide what you will do before you know whether you will get the opportunity to play--You must make a conditional decision: If I get to play what will I do? It could well be that you would decide not to play when Player 1, so you might never get a chance to give a Player 2 decision.

• So what strategies do you choose? When 1, you can get -1, 0, or 9. If you play when Player 1, and also cooperate when Player 2, then you will get 9 and at least tie for best.

• However, if you think you will decide to defect when Player 2 you would be best off not playing at all (-1 would tie for worst).

• When Player 2 you can get 0,9, or 10. If you get a chance to play you can cooperate and get 9 or defect and get 10. If you cooperate you will lose to those who play at step 1 and defect at step 2.

• • But which other players would do that? Players

who think defection is rational will not play at step 1. Thus how can you lose by cooperating?

• You can reason this before playing as Player 1, and hence can confidently play, and be reasonably certain you will later cooperate.

• What is about playing one’s self that makes cooperation at step 2 seem rational? The key is the correlation between the decisions made at both steps. You are able to assume that whatever you decide before play 1 about what you will later decide as Player 2 will in fact come to pass.

• The assumption of rational players makes this correlation even stronger—if a set of decisions is rational all rational players will adopt them.

• It is of course just this correlation between decisions of multiple players that is ignored when arriving at Nash equilibria, and other seemingly irrational decisions.

• Thus when we say all players are assumed rational (not very satisfactory if we have not defined rationality), we are more importantly saying that the decisions of the players are positively correlated.

• There are many reasons why decisions could and should be correlated—social norms, (playing one’s self), group consensus, reliance of experts, and so on.

• Interestingly, if the expert community defines rationality in a way that makes defection rational, then assuming the opponents to be rational would lead to defection, not cooperation. Perhaps this occurred in the early days of game theory.

Another lesson to take home:

• Rationality is ‘locally defined’ and ‘context bound’

• Further, there is room for disagreement– Colman in his recent BBS article discusses

the Centipede game and concludes there is nor resolution and no rational way to play.

– Others would not like the resolution I suggested. Even others argue an initial STOP is the rational play in the centipede game.

Newcomb’s Paradox

• (My version). A is given two envelopes and told they contain the same amount of money, either (1,1) or (1000,1000). A’s goal is to maximize gain, and A can choose either: ONE envelope, or BOTH envelopes.

• Before the game a psychologist predicts whether A will choose ONE or BOTH. If the prediction is ONE, 1000 is placed in both. If the prediction is BOTH, 1 is placed in both.

• Strangely the prediction is quite accurate. How is this possible? A actually plays the game twice, but is given the drug Midazolam, which leaves reasoning intact, but prevents learning. Once an interruption occurs, what has occurred before is completely forgotten (dentists use this stuff).

• Whatever A chooses the first time, the psychologist predicts A will do again. A is told about Midazolam, about both trials, and is told that exactly the same information is provided on both trials.

• Trial 1: Either: {1,1} or {1000,1000}

-- A chooses BOTH or ONE

• A forgets

Trial 2: {1,1} {1000,1000}

Should A choose BOTH or ONE?

or

• P tells A both times that much testing shows that the prediction is 95% accurate in the general population: When people choose BOTH, 95% of the time the envelopes contain 1; when people choose ONE, 95% of the time, the envelopes contain 1000 – i.e. people tend to make the same decision in the same situation.

• A complains that A doesn’t know if this is the first or second test; if it is the first, he should choose ONE so there will be 1000 in the envelopes on test two.

• However, P says the goal is to maximize gain only on the present trial, however many trials may have preceded it.

What should A choose?

• Argument 1: Whatever is in the envelopes, it is too late to do anything about it, so it only makes sense to choose BOTH and double the gain.

• Argument 2: A should choose ONE, because A tends 95% of the time to make the same decision. Choosing ONE makes it likely that 1000 is in both envelopes.

• Rejoinder: How can what one chooses now affect what is in the envelopes? They have already been filled!

• Rejoinder: That is a good way to get $2, and lose $1000. Obviously the ONE choosers tend to get $1000.

• The best human minds cannot decide what is right. That is, people are quite often sure they are right, but they disagree. Roughly 1/3 of the people say BOTH, 1/3 say ONE, and 1/3 say it cannot be decided. There are many published articles defending each alternative.

• What does this say about ‘rationality’?

• Rationality is what people say it is, but which people? And what proportion of them?

• I believe there the rational decision is ONE, but how does one convince those in the other two camps that this is right?

• Because rationality is a ‘cognitive’ phenomenon, we can try ‘reframing’ the problem to convince others. Of course the recipient of the reframed problem must also be convinced that the new problem retains the critical elements of the original.

• I’ll try this out with a few examples.

• Scenario 1:• B watches A decide. Whatever A gets, B gets

also. B knows people fall into two classes, those choosing ONE and those choosing BOTH. B would very much want A to be someone who chose ONE last time. Therefore B wants A to choose ONE this time– i.e. to be a ONE chooser type.

• Suppose A chooses, but before the envelopes are open, B can sell the profit. If A’s choice was ONE, B expects 1000, so would sell for something like 900. If A’s choice was BOTH, B expects 2, so would sell for something like 5.

• How is this thinking different for A, the chooser? After a choice, but before opening the envelopes, how much would A sell the (unknown) profit for? The sales prices would be similar to those demanded by B. E.g. following a BOTH choice by A, A knows herself to be a BOTH chooser, and expects to get $2. Following a ONE choice by A, A knows herself to be a ONE chooser, and expects $1000.

• Why should A wait to notice this? Why not choose ONE and ‘produce’ $1000?

• Scenario 2:• A and B are together, each discussing the

choice they are about to make for their own sets of envelopes. They know they were together on trial 1 also. They argue– A says BOTH, B says ONE; neither can convince the other.

• Just before the choice B pulls a knife and forces A to choose ONE, saying “I won’t let your foolishness cost you money”.

• What does A now expect to get? If on trial 1, B had also forced A at knifepoint to choose ONE, then A expects to get 1000.

• Why should A need a knife to choose ONE? A can choose ONE freely, with the same result.

• Perhaps these ‘reframings’ make you reconsider your answer. But what is wrong with the argument that one’s choice now can’t change what is in the envelopes? This seems like ‘backward causality’.

• This is a confusion of causality and correlation. The two choices are correlated, but neither causes the other. Rather, both are caused by the ‘thinking processes’ used by A to decide. Given no memory, these thinking processes tend to repeat most of the time.

• Scenario 3: At the time A makes her choice, the envelopes are empty. A is told the envelopes will be filled on the basis of the next choice A will make (under midazolam, without memory, with identical instructions). A is told to maximize present gain. She chooses, forgets, makes a subsequent choice, and later gets the payoff determined by the subsequent choice. Should she choose ONE or BOTH?

• A good case can be made that the forward and backward versions are identical in structure. Yet most people believe that backward causality doesn’t apply in the forward case, and most people faced with the forward case see good reason to choose ONE. They reason: “If I choose ONE now, I will likely do so again, because the choice situation is identical”.

• Why is anything different in the backward case?

Scenario 4: Non-intuitive Information

• Suppose A is having trouble deciding, and is told: “If you like, you can open the envelopes, look at the contents, and then decide. I gave you this same option the previous time you participated.”

• Should A look?

• Strangely, A should decline the offer. Why?

• If A looks, A will certainly choose BOTH, whatever is found. A does not have to look at the contents to know this. Hence a decision to look is just a decision to choose BOTH.

• If A now chooses to look, A would have likely chosen to look the previous time, making an outcome of $2 very likely.

• In general, the more A knows about the likely contents of the envelopes, the more A would want to choose BOTH.

Scenario 5: Height

• The psychologist does not use a previous trial to fill the envelopes. Instead, he tells A that much research has shown that height is a reliable predictor of choice, enabling the psychologist to predict choice correctly 52% of the time (for reasons unknown).

• The envelopes have been filled on the basis of A’s height.

• A does not know whether being tall or short causes $1000 to be in the envelopes.

• Whatever size A is, if A chooses ONE, it makes that height more likely to be the height associated with $1000, and conversely for a choice of BOTH.

• Hence A should choose ONE, if the payoff difference is large enough.

• In the earlier versions, A’s decision processes were the causative agent producing the contents of the envelopes on both occasions. In this version, A’s height is the causative agent.

• Scenario 6: Symmetry of Decisions

• Some people are bothered by the Midazolam scenarios becomes it seems there is an infinite regress of past (or future) decision situations required.

• So let there be exactly two trials under Midazolam, the payoff for trial A determined by B, and vice versa, these facts known at both trials.

• There are more reframings of this sort, but this gives you the idea. I have found that even the most fervent BOTH choosers either change their position, or at least express uncertainty, when given these and other problem reframings.

• Like the centipede game, the resolution I argue for is based on the correlation between the two different decisions, a third factor being the cause that produces the correlation.

• Regardless of the ‘answer’, we see once again that rationality is what people say it is. There is no guarantee people will agree, producing an disturbing situation.

• Connecting Newcomb’s, Backward Induction, and Prisoner’s Dilemma

• A ‘prisoner’s dilemma’ has two players, A, B, each of whom has a decision option E that will guarantee them a better outcome, regardless of what the other player does. However, if both players choose this dominating strategy, they each will get a poor outcome:

B

A

D E

D

E (-5, -5)A, B

A, B(-10, 8)

A, B(5, 5)

A, B(8, -10)

For both A and B, 8 is betterthan 5 and –5 is better than-10, so both prefer E and both get –5. But if both choose D, both get +5!

• The centipede game argument would imply that two known rational players would both choose D, the ‘cooperate strategy’.

• But what would a ‘real’ player do? Doug Hofstadter once ran a test in Scientific American– he thought his bright friends would cooperate; however, they defected.

• But what would a ‘real’ player do playing against him- or herself?

• Have a player make a choice under midazolam, ‘twice’. Knowing that the opponent is oneself, would a player cooperate? Would it matter if the choice is the first or second one made, as long as there is no memory?

• The choices are symmetric so the choice order seems irrelevant, but suppose your current choice is second. Whatever you chose the other time, you know you will do better by defecting this time. Should you cooperate nonetheless? This is Newcomb’s in another guise. You can make yourself a ‘cooperative’ person.

• In all these cases, what is rational is not defined absolutely, by rule, or by axiomatic system. What is a rational decision is a cognitive process, context dependent, and subject to some sort of general agreement by thoughtful humans.

The Exchange Paradox • (In economic circles, known as Nalebuff’s

Paradox (1988, and related to Siegel’s Paradox in foreign exchange, 1972)

• One envelope has 10 times the money of another (i.e. M and 10M).

• The player (P) chooses an envelope and chooses to keep the contents (X), or irrevocably exchange for the other.

{$D $10D}

{$10D $D}

The Strategy

• P reasons the other envelope has half a chance of having 10X, and half a chance of having (1/10)X. The expected value for exchanging is then:

• E(V) = (1/2)(1/10)X + (1/2)(10)X =5.05X

• This is larger than X, so P exchanges.

The Puzzle

• This reasoning applies regardless of X.• Hence P should always exchange.• But if P always exchanges, why even look at the

contents of the first envelope? Why not save a step and just take the second? But then the same reasoning says one should switch back.

• More critically, since P chooses randomly, and always exchanges, symmetry requires that the amounts rejected have the same probability distribution as those accepted.

The Paradox

• How can P gain by exchanging, and yet not gain at all?

Resolution Number One

• The savvy among you notice that it matters how the envelopes are filled. One needs to know what are the amounts X and what probabilities they have. The problem doesn’t say. It might be that once we know exactly how the envelopes are filled, the paradox will disappear.

• Do you think this is the case?

An Algorithm for Filling

• We fill as follows: Flip a coin until the first heads appears, on the n-th flip. Then put 10n-1 and 10n in the two envelopes.

• E.g. with prob ½ a heads comes up on flip 1, so we put 1 and 10 in the envelopes. With probability ¼ a heads comes up first on the second flip, so we put in 10 and 100. Etc.

Paradox Not Solved

• If we observe X > 1, P([1/10]X, X) = 2P(X,10X)

• So P(other envelope has 1/10 X) = 2/3 and P(other envelope has 10X = 1/3.

• Hence E(V) = (2/3)(1/10)X + (1/3)(10X) = 102/30 = 3.4X > X, so exchange.

• If X = 1, exchanging gains 9, for sure.

• So, always exchange.

But

• By symmetry, distribution of amounts rejected and accepted have to be identical (same reasoning as before). How can we gain and not gain simultaneously?

• This game is easy to program on your PC. One can verify that exchanging produces 3.4X, and also that the amounts rejected and accepted have the same distribution.

Paradox Resolution Number 2

The even savvier among you will notice that

this game has infinite expectation– the gains keep going up by a factor of 10, but the probabilities keep going down by a factor of ½.

E(G) = (.5)1101 + (.5)2102 + (.5)3103 + … = ∞

Everyone knows we can’t compare gains when both lines of play have infinite expectation. One infinity can’t be larger than the other (e.g. another example some of you may know is called the St. Petersburg Paradox).

Perhaps this explains the paradox: Perhaps such a paradox couldn’t appear in a finite game.

• Most writers think this, and stop here.

Can we make the game finite?

• Make the game finite by terminating the coin flips with a heads if a very large number Z of consecutive tails occurs.

• A possible strategy would then be: exchange always, except when X =10Z, in which case STAY, because this is the largest possible.

• (This is still paradoxical, but I’ll return later to this point).

• But this loses some paradoxical essence, because P does not ALWAYS exchange.

Always Exchange in a Finite Game

• Ask someone to choose a largest limit N, vastly smaller than Z, but still vast. This is easy to do if Z is something like 10100000.

• If N coin flips come up tails, then one stops anyway, and pretends the last flip was a heads.

• One way to get the number N:

• Ask a friend to fill a sheet of paper with digits. Permute these, and the result is N.

• There are only so many digits that can fit on a sheet of paper, so N is obviously finite, though unknown.

• For any number X observed, P has an infinitesimally small chance, c, of guessing correctly that this is the largest possible.

• I.e. How likely is it that X is the number N, rather than any number smaller than N?

• For exchanging, E(V|X) =

(1-c)(3.4X) + (c)(.1X) =~ 3.4 X > X

Unfortunately, this means the paradox returns in a finite

game: One should always exchange. But how can this

be sensible?

Paradox Made Worse?

• Instead of P choosing an envelope at random, someone examines the two and hands P the largest with probability .8, and the smallest with probability .2.

• P(other is larger) is: (.5)Q(.2)/[(.5)Q(.2)+Q(.8)] = 1/9

• E(V) = (8/9)(1/10)X + (1/9)10X = 1.2X > X

• So always exchange.

This is bad news

• If P always exchanges, then P will exchange the larger for the smaller .8 of the time. Further, the distribution of rejected numbers strictly dominates those accepted.

• If one exchanges one gets, and rejects:• Amount P(Gets) P(Rejects) P(G):P(R) 1 .4 .1 4:1 | | | 10n (½)n-1(.6) (½)n-1(.9) 6:9 | | | 10N-1 (½)N-1 (½)N-1 1:1 10N (½)N-1(.2) (½)N-1(.8) 2:8

• Exchanging gets 1 more often, and every other outcome less often!

• Another way to look at things: Suppose you are handed the .8 and .2 envelopes, but don’t open the .8 one. You clearly would want the .8 envelope, since it is 4 times more likely to have the larger amount. However, once you open it, you seem to want to exchange for the .2, regardless of what you find.

• This probably makes you uneasy.

• If you see X, the contents of the .8 envelope, you want to exchange.

• If you see X*, the contents of the .2 envelope, you also want to exchange, only you expect to gain more than the first case.

• Given this, why exchange a .8?

• This might make you feel uneasy.

What a good empiricist would do

• P, a scientist, carries out a test: P programs this game on his PC, always exchanges, and tables the outcomes for many thousands of trials.

• Now P will learn if P is winning or losing!

• (No more worries about poor reasoning abilities or bad mathematical derivations).

P gets an answer or two

• For X =1, P gains 9

• For all other X, P gains ~.2X (the larger the number of simulation trials, the closer is the convergence to .2X).

• Case closed.

• But..

Another answer

• P tables the results accepted and rejected:• Amount P(Accepted) P(Rejected)• 1 .4 .1• 10n (½)n-1(.6) (½)n-1(.9)• 10N-1 (½)N-1 (½)N-1

• 10N (½)N-1(.2) (½)N-1(.8)

• Oops: P is rejecting larger amounts.

• If one exchanges one gets (G), and rejects (R):

Amount P(G):P(R)

1 4:1

| |

10n 6:9

| |

10N-1 1:1

10N 2:8

The empirical ratios get closer to the ones listed above, the more simulations are run.

MOST IMPORTANT: THE NUMBERS COME FROM THE SAME TABLE SHOWING GAINS FOR EXCHANGING

Failure of empirical testing

• So, is P winning or losing?• It is tempting to think the second test is

better– more money is piling up in the rejected envelopes. P might therefore decide never to exchange.

• But, surely P would exchange for X = 1!• So P should decide to exchange for X =1

only? But exchanging on 1 and 10 would be even better!

Egads!

• So exchanging on all X’s up to K is dominated by the strategy of exchanging on all X’s up to K+1!

• So P should always exchange!

• But this loses money!

• Help!!

Paradox Resolved• Both answers are correct. The problem is that

the goal is ambiguous. Maximizing expected gain has several interpretations:

1) Expected gain on the present trial, given X [E|X]

2) Expected gain for playing the game, given one uses a strategy S, where S specifies the X at which one stops exchanging and starts STAYING [EG|X]

• A few remarks: • As noted E|X is higher for exchanging, for every X.• But EG|X for the strategy ‘always exchange’ is exactly

the same as for the strategy ‘never exchange’. How is this? The amount lost when one exchanges the highest possible number balances the gains for all smaller numbers to produce equality.

• But of course some strategies produce higher EG|X than others. E.g. EG|1 is clearly not as good as EG|10: It is obviously better to exchange a 1 for the certain gain of 9.

• Most people have a strong intuition that the expected value of this game will be maximized if one makes the decision for each observed X that maximizes expected gain for that trial.

• That this is not the case is seen in this paradox, but is hard to believe.

• Perhaps the case is clearer in a simplified example.

Simplified example 1

• Consider a version of the St. Petersburg Paradox. One starts with $1, and flips a coin. HEADS causes the current total to triple. TAILS causes all the money to be lost, and the game ends.

• One can play as long as one wants.

• The goal is to ‘maximize expected gain’.

• There is one exception to the above.

• Exception: The game has an upper limit, N, unknown but chosen to be vastly smaller than a very large number U (such as 10**100). If the current total = U, and the decision is made to PLAY, then regardless of the coin flip all money is lost and the game ends.

• The probability of guessing that the current total is U is infinitesimally small, so the expected value for playing stays about 1.5X.

• Hence A will always play, for every X. But this strategy maximizes conditional expected gain on the current trial. It minimizes expected gain for the game, because it guarantees with probability 1.0 that all money will be lost (eventually U will be reached).

• Clearly there the decision to exchange for each X gives a positive expected gain for each X, but the strategy to exchange for all X produces a total game return of 0. It is surprisingly easy to confuse these different expectation quantities.

– Note: Choosing a strategy to maximize expected return for the game is a very complex matter.

• The exchange paradox presents a similar confusion.

• Perhaps partly due to such a confusion, some people have concluded one should exchange always, others to conclude one should not, others to conclude it does not matter, and yet others to conclude no rational strategy exists.

• Thus the exchange paradox reveals yet another facet of the general claim that rationality is a cognitive process, subject to reinterpretation and reanalysis in the context of a given problem.

• Given this, it is not too surprising that persuasive arguments for the rationality of one or another decision are heavily weighted by problem framing, and by examples.

• The finite exchange paradox also reveals a difference that a number of researchers have noted between ‘uncertainty’ and ‘vagueness’. The highest number N is vague. There is no Bayesian prior we can place on N that makes ‘sense’. If we specify a prior it is easy to calculate a number Y at which one would not exchange.

• But having done so, if Y were reached, one would not believe that stopping was the best strategy.

• Joe Halpern (for example) has argued that for vaguely specified situations, one might have a set of priors that are possible, without probabilities one can assign to them. Then one can use some minimax type strategy (e.g. protect against the worst outcome) to choose which prior to assume.

• This would not work here (stopping at 1 is not a good idea), but it seems to be the case that vague problems produce vague optimization.

The ‘surprise exam’

• On day one of a twenty lecture course, the teacher tells the class that there will be one surprise exam, but that the class members will not be able to predict with certainty the occurrence of the exam on the morning before it will occur.

• Backward Induction leads to a contradiction: The class reasons that the exam cannot occur on the last day, because the exam occurrence could be predicted. If so, then it cannot occur on the second to last day, for the same reason. Backward Induction continues until every day is ruled out. Ruling out every day seems to imply that the teacher has lied.

• But the exam does occur on, say, lecture seven, and indeed the class is ‘surprised’.

• Indeed, ruling out every day seems to guarantee ‘surprise’. But is this because the teacher might have lied?

• With a little thought, both teacher and students can see that the statement is true, as long as some exam day ‘in the middle’ is chosen.

• However not everyone agrees. Russ Lyons (math here at IU) looks at the case of one….

• “Do you agree that the number of days is irrelevant? If so, let's take just one day. The setter says "I will give you an exam today but you won't know whether I will do that". If this is false, he lies. If this is true, then it must be because he might not give an exam today (so he might be lying), and the truth is unknown (since it deals with the future) or because the student is confused/stupid/doesn't speak English/etc. If he does give the exam, then it becomes true, otherwise false. Of course, an alternative is that the statement is neither true nor false. That's probably the best; it's like saying "let S be the set of all sets". That's meaningless as there is no such S.“

• But is it the case that ‘truth’ is unknown?

• ‘Truth’, like ‘rationality’ may be a cognitive construct, and socially reified.

• When the number of lectures is very large (say 10**100 if you like), and the instructor says the surprise exam will be on a day vastly short of the end of the course, then it seems clear to both instructor and student that the statement is ‘true’, and must be so.

• But for one lecture, there is a contradiction, and perhaps no truth value.

• How about two lectures? Three? Four?• At what point does the truth value change from

‘uncertain’ to ‘true’?• Lyons would say the truth value is always

uncertain. I am less dogmatic. I see truth ultimately being decided by the real universe we live in. In this universe, for large N, the statement seems true by all sensible measures.

• But at what N the transition occurs, I’m unsure.

‘Sleeping Beauty Paradox’

• (Seems to have grown from an earlier set of paradoxes introduced by Piccione and Rubinstein in 1995: On the interpretation of decision problems with imperfect recall).

• One version: A coin is flipped. HEADS means sleeping beauty (SB) is awakened Monday and asked to estimate the probability that a HEADS had been flipped. TAILS means SB is awakened on both Monday and Tuesday, without memory of any other awakenings, and asked the same query both days.

• What should SB answer?

• Opinions seem sharply split with vociferous defenders of both 1/2 and 1/3.

• As with Newcomb’s, many people find the question ridiculous, but they disagree on the answer. Other people are unsure.

• More generally, the ‘thirders’ believe that if TAILS produces N awakenings, the answer is 1/(N+1) for P(HEADS).

• In a slight variant, HEADS causes SB to awaken in Room A once, and be queried. TAILS causes SB to be awakened in Room B twice, without memory, and be queried each time.

• The ‘thirders’ believe that the probability of Room A is 1/3.

• I think there is a ‘surreal’ aspect to this.

• Suppose SB is told there will be a prize of $1,000,000 if a tails occurs and she is in Room B, awarded if SB requests it when awoken. SB may request whatever number of awakenings in room B she desires, up to say, 100.

• SB thinks: This is terrific; I can increase my odds of getting the million to 100/101 by requesting 100 wake ups.

• If this seems strange, perhaps the thirders can argue that there is a 50% chance of the million, but once an awakening occurs the conditional probability of a tails is then 100/101 (sic). If so, then SB ought to be willing to turn down an offer of say $900,000 for her potential winnings.

• Dave Chalmers gives the following as his argument for 1/3: The query is either Mon or Tue. Given Monday, P(H|M) = .5. Given Tuesday, P(H) = 0. P(H) = P(M)P(H|M) + P(T)P(H|T) = P(M)P(H|M) + 0 = (1/2)P(M).

• Since P(M) is between 0 and 1, P(H) must be less than .5.

• What do you think of this?

• Of course the condition changes from the beginning to the end of this argument. At the outset Dave says P(H|M) = ½ because there is an equal probability of a query on Mon following a Heads and a Tails (Heads always produces a Monday query, and Tails always produces a Monday query.) But this implies P(M|H) = P(M|T) = 1.0. Thus P(M) = 1.0. Thus the last equation from Chalmers implies P(H) = ½.

• Of course Chalmers argues P(M) is less than 1.0, because he considers that SB might awaken Tuesday.

• This means we are selecting from events on the Tails side. For this condition, the probability of Monday given Tails is 1/2 (whereas P(M|Heads) stays at 1.0). This means that p(H|M) = 2/3, not 1/2.

• So far, Chalmers and Terry Horgan, among others, steadfastly believe that the answer is 1/3 (and have written journal articles saying so). What then is the rational answer?

The Absent-Minded Driver

• Perhaps Sleeping Beauty is not such a paradox (social consensus notwithstanding), but Piccione and Rubinstein raise a more interesting issue.

• A driver is in a bar about to set off for home. There are two exits and then a long drive to a distant city. The driver lives at exit 2. Exit 1 is dangerous. If he passes 2, he must take a motel for the night in the distant city. However he is forgetful and when reaching any exit does not remember whether he has passed any exits already.

• His payoff for taking exit 1 is 0, for exit 2 is 4, and for going past 2, is 1. What should he plan to do? If he cannot choose a probabilistic plan, he should decide to always go, getting 1 rather than 0. Suppose he so decides. He now finds himself at an exit. Knowing he decided to keep driving, he guesses there is a probability of ½ that this is exit 2. He therefore changes his mind and decides to exit (for an expected gain of 0/2 + 4/2 = 2).

• Of course, this is circular. Once he decides to change his mind, he knows he would have changed his mind at exit 1, so this must be exit one, so he should not exit.

• The issue is less silly when he can decide to exit with some probability, p. A simple calculation shows p = 1/3 maximizes expected gain, if applied consistently. E(G) = (1/3)0 + (2/3)(1/3)4 + (2/3)(2/3)1 = 4/3.

• However, having chosen this strategy, he now finds himself at an exit. He now forms an estimate, e, of the probability that this is exit 1, knowing his strategy. He then adjusts his strategy. This leads to various forms of circular adjustments and possible convergence on some strategy.

• Strangely, a good case can be made for deciding at an exit to exit with p = 5/9, even though if used consistently this strategy gives a lower E(G) than 1/3. The local decision is sensible even the global outcome is not.

• The original article and replies and counter-replies are worth a look.

the psychological basis of rationality: examples from games and paradoxes richard m. shiffrin

Documents