best experienced payo dynamics and cooperation in the ...luis.izqui.org/papers/bepc.pdf · best...

56
Best Experienced PayoDynamics and Cooperation in the Centipede Game * William H. Sandholm , Segismundo S. Izquierdo , and Luis R. Izquierdo § December 4, 2018 Abstract We study population game dynamics under which each revising agent randomly selects a set of strategies according to a given test-set rule, plays each strategy in this set a fixed number of times, with each play of each strategy being against a newly drawn opponent, and chooses the strategy whose total payowas highest, breaking ties according to a given tie-breaking rule. In the Centipede game, these best experienced payodynamics lead to cooperative play. Play at the almost globally stable state is concentrated on the last few nodes of the game, with the proportions of agents playing each strategy being dependent on the specification of the dynamics, but largely independent of the length of the game. Since best experienced payodynamics are defined by random sampling procedures, they are represented by systems of polynomial dierential equations with rational coecients, allowing us to establish key properties of the dynamics using tools from computational algebra. * We thank Dan Friedman, Drew Fudenberg, Ken Judd, Panayotis Mertikopoulos, Erik Mohlin, Ignacio Monz ´ on, Arthur Robson, Marzena Rostek, Ariel Rubinstein, Larry Samuelson, Ryoji Sawa, Andy Skrzypacz, Lones Smith, Mark Voorneveld, Marek Weretka, and especially Antonio Penta for helpful discussions and comments. Financial support from the U.S. National Science Foundation (SES-1458992 and SES- 1728853), the U.S. Army Research Oce (W911NF-17-1-0134 MSN201957), project ECO2017-83147-C2-2-P (MINECO/AEI/FEDER, UE), and the Spanish Ministerio de Educaci ´ on, Cultura, y Deporte (PRX15/00362 and PRX16/00048) is gratefully acknowledged. Department of Economics, University of Wisconsin, 1180 Observatory Drive, Madison, WI 53706, USA. e-mail: [email protected]; website: www.ssc.wisc.edu/ ˜ whs. BioEcoUva Research Institute on Bioeconomy. Department of Industrial Organization, Universidad de Valladolid, Paseo del Cauce 59, 47011 Valladolid, Spain. e-mail: [email protected]; website: www.segis. izqui.org. § Department of Civil Engineering, Universidad de Burgos, Edificio la Milanera, Calle de Villadiego, 09001 Burgos, Spain. e-mail: [email protected]; website: www.luis.izqui.org.

Upload: others

Post on 15-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

Best Experienced Payoff Dynamics andCooperation in the Centipede Game∗

William H. Sandholm†, Segismundo S. Izquierdo‡, and Luis R. Izquierdo§

December 4, 2018

Abstract

We study population game dynamics under which each revising agent randomlyselects a set of strategies according to a given test-set rule, plays each strategy inthis set a fixed number of times, with each play of each strategy being against anewly drawn opponent, and chooses the strategy whose total payoff was highest,breaking ties according to a given tie-breaking rule. In the Centipede game, thesebest experienced payoff dynamics lead to cooperative play. Play at the almost globallystable state is concentrated on the last few nodes of the game, with the proportions ofagents playing each strategy being dependent on the specification of the dynamics, butlargely independent of the length of the game. Since best experienced payoff dynamicsare defined by random sampling procedures, they are represented by systems ofpolynomial differential equations with rational coefficients, allowing us to establishkey properties of the dynamics using tools from computational algebra.

∗We thank Dan Friedman, Drew Fudenberg, Ken Judd, Panayotis Mertikopoulos, Erik Mohlin, IgnacioMonzon, Arthur Robson, Marzena Rostek, Ariel Rubinstein, Larry Samuelson, Ryoji Sawa, Andy Skrzypacz,Lones Smith, Mark Voorneveld, Marek Weretka, and especially Antonio Penta for helpful discussionsand comments. Financial support from the U.S. National Science Foundation (SES-1458992 and SES-1728853), the U.S. Army Research Office (W911NF-17-1-0134 MSN201957), project ECO2017-83147-C2-2-P(MINECO/AEI/FEDER, UE), and the Spanish Ministerio de Educacion, Cultura, y Deporte (PRX15/00362and PRX16/00048) is gratefully acknowledged.†Department of Economics, University of Wisconsin, 1180 Observatory Drive, Madison, WI 53706, USA.

e-mail: [email protected]; website: www.ssc.wisc.edu/˜whs.‡BioEcoUva Research Institute on Bioeconomy. Department of Industrial Organization, Universidad de

Valladolid, Paseo del Cauce 59, 47011 Valladolid, Spain. e-mail: [email protected]; website: www.segis.izqui.org.§Department of Civil Engineering, Universidad de Burgos, Edificio la Milanera, Calle de Villadiego,

09001 Burgos, Spain. e-mail: [email protected]; website: www.luis.izqui.org.

Page 2: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

1. Introduction

The discrepancy between the conclusions of backward induction reasoning and ob-served behavior in certain canonical extensive form games is a basic puzzle of game theory.The Centipede game (Rosenthal (1981)), the finitely repeated Prisoner’s Dilemma, and re-lated examples can be viewed as models of relationships in which each participant hasrepeated opportunities to take costly actions that benefit his partner, and in which thereis a commonly known date at which the interaction will end. Experimental and anecdo-tal evidence suggests that cooperative behavior may persist until close to the exogenousterminal date (McKelvey and Palfrey (1992)). But the logic of backward induction leadsto the conclusion that there will be no cooperation at all.

Work on epistemic foundations provides room for wariness about unflinching appealsto backward induction. To support this prediction, one must assume that there is alwayscommon belief that all players will act as payoff maximizers at all points in the future, evenwhen many rounds of previous choices argue against such beliefs.1 Thus the simplicityof backward induction belies the strength of the assumptions needed to justify it, and thisstrength may help explain why backward induction does not yield descriptively accuratepredictions in some classes of games.2

This paper studies a dynamic model of behavior in games that maintains the as-sumption that agents respond optimally to the information they possess. But rather thanimposing strong assumptions about agents’ knowledge of opponents’ intentions, we sup-pose instead that agents’ information comes from direct but incomplete experience withplaying the strategies available to them. As with earlier work of Osborne and Rubinstein(1998) and Sethi (2000), our model is best viewed not as one that incorporates irrationalchoices, but rather as one of rational choice under particular restrictions on what agentsknow.

Following the standard approach of evolutionary game theory, we suppose that twopopulations of agents are recurrently randomly matched to play a two-player game.This framework accords with some experimental protocols, and can be understood morebroadly as a model of the formation of social norms (Young (1998)). At random times,each agent receives opportunities to switch strategies. At these moments the agent decideswhether to continue to play his current strategy or to switch to an alternative in a testset, which is determined using a prespecified stochastic rule τ. The agent then plays

1For formal analyses, see Binmore (1987), Reny (1992), Stalnaker (1996), Ben-Porath (1997), Halpern(2001), and Perea (2014).

2As an alternative, one could apply Nash equilibrium, which also predicts noncooperative behavior inthe games mentioned above, but doing so replaces assumptions about future rationality with the assumptionof equilibrium knowledge, which may not be particularly more appealing—see Dekel and Gul (1997).

–2–

Page 3: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

each strategy he is considering against κ opponents drawn at random from the opposingpopulation, with each play of each strategy being against a newly drawn opponent. Hethen switches to the strategy that achieved the highest total payoff, breaking ties accordingto a prespecified rule β.3

Standard results imply that when the populations are large, the agents’ aggregatebehavior evolves in an essentially deterministic fashion, obeying a differential equationthat describes the expected motion of the stochastic process described above (Benaım andWeibull (2003)). The differential equations generated by the protocol above, and our objectof study here, we call best experienced payoff dynamics, or BEP dynamics for short.

Our model builds on earlier work on games played by “procedurally rational players”.In particular, when the test-set rule τ specifies that agents always test all strategies, andwhen the tie-breaking rule β is uniform randomization, then the rest points of our process(with κ = k) are the S(k) equilibria of Osborne and Rubinstein (1998). The correspondingdynamics were studied by Sethi (2000). The analyses in these papers differ substantiallyfrom ours, as we explain carefully below.

Our analysis of best experienced payoff dynamics in the Centipede game uses tech-niques from dynamical systems theory. What is more novel is our reliance on algorithmsfrom computational algebra and perturbation bounds from linear algebra, which allowus to solve exactly for the rest points of our differential equations and to perform rigorousstability analyses. We complement this approach with numerical analyses of cases inwhich analytical results cannot be obtained.

Our initial results focus on dynamics under which each tested strategy is tested exactlyonce (κ = 1), so that agents’ choices only depend on ordinal properties of payoffs. In Cen-tipede games, under BEP dynamics with tie-breaking rules that do not abandon optimalstrategies, or that break ties in favor of less cooperative strategies, the backward inductionstate—the state at which all agents in both populations stop at their first opportunity—isa rest point. However, we prove that this rest point is always repelling: the appearance ofagents in either population who cooperate to any degree is self-reinforcing, and eventuallycauses the backward induction solution to break down completely.

For all choices of test-set and tie-breaking rules we consider, the dynamics have exactlyone other rest point.4 Except in games with few decision nodes, the form of this rest pointis essentially independent of the length of the game. In all cases, the rest point has virtually

3Tie-breaking rules are important in the context of extensive form games, where different strategies thatagree on the path of play earn the same payoffs.

4While traditional equilibrium notions in economics require stasis of choice, interior rest points ofpopulation dynamics represent situations in which individuals’ choices fluctuate even as the expectedchange in aggregate behavior is null—see Section 2.2.

–3–

Page 4: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

all players choosing to continue until the last few nodes of the game, with the proportionsplaying each strategy depending on the test-set and tie-breaking rules. Moreover, this restpoint is dynamically stable, attracting solutions from all initial conditions other than thebackward induction state. Thus under a range of specifications, if agents make choicesbased on experienced payoffs, testing each strategy in their test sets once and choosingthe one that performed best, then play converges to a stable rest point that exhibits highlevels of cooperation.

To explain why, we first observe that cooperative strategies are most disadvantagedwhen they are most rare—specifically, in the vicinity of the backward induction state. Nearthis state, the most cooperative agents would obtain higher expected payoffs by stoppingearlier. However, when an agent considers switching strategies, he tests each strategy inhis test set against new, independently drawn opponents. He may thus test a cooperativestrategy against a cooperative opponent, and less cooperative strategies against less coop-erative opponents, in which case his best experienced payoff will come from the cooperativestrategy. Our analysis confirms that this possibility indeed leads to instability (Sections4 and 5).5 After this initial entry, the high payoffs generated by cooperative strategieswhen matched against one another spurs their continued growth. This growth is onlyabated when virtually all agents are choosing among the most cooperative strategies. Theexact nature of the final mixture depends on the specification of the dynamics, and can beunderstood by focusing on the effects of a small fraction of possible test results (Section4.1.3).

Our final results consider the effects of the number of trials κ of each strategy in the testset on predictions of play. It seems clear that if the number of trials is made sufficientlylarge, so that the agents’ information about opponents’ behavior is quite accurate, thenthe population’s behavior should come to resemble a Nash equilibrium. Indeed, whenagents possess exact information, so that aggregate behavior evolves according to the bestresponse dynamic (Gilboa and Matsui (1991), Hofbauer (1995)), results of Xu (2016) implythat every solution trajectory converges to the set of Nash equilibria, all of which entailstopping at the initial node.

Our analysis shows, however, that stable cooperative behavior can persist even forsubstantial numbers of trials. To start, we prove that the backward induction state isunstable as long as the number of trials κ is less than the length of the game. For larger

5Specifically, linearizing any given specification of the dynamics at the backward induction state identifiesa single eigenvector with a positive eigenvalue (Appendix B). This eigenvector describes the mixture ofstrategies in the two populations whose entry is self-reinforcing, and identifies the direction toward whichall other disturbances of the backward induction state are drawn. Direct examination of the dynamicsprovides a straightforward explanation why the given mixture of entrants is successful (Example 4.3).

–4–

Page 5: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

number of trials, the backward induction state becomes locally stable, but numericalevidence suggests that its basin of attraction is small. Examining Centipede games oflength d = 4 in detail, we find that a unique, attracting interior rest point with substantialcooperation persists for moderate numbers of trials. With many trials, the attractor isalways a cycle, and includes significant amounts of cooperation for numbers of trials aslarge as 200. We explain in Section 6 how the robustness of cooperation to fairly largenumbers of trials can be explained using simple central limit theorem arguments.

Our main technical contribution lies in the use of methods from computational algebraand perturbation theorems from linear algebra to obtain analytical results about the prop-erties of our dynamics. The starting point for this analysis, one that suggests a broaderscope for our approach, is that decision procedures based on sampling from a popula-tion are described by multivariate polynomials with rational coefficients. In particular,BEP dynamics are described by systems of such equations, so finding their rest pointsamounts to finding the zeros of these polynomial systems. To accomplish this, we com-pute a Grobner basis for the set of polynomials that defines each instance of our dynamics;this new set of polynomials has the same zeros as the original set, but its zeros can becomputed by finding the roots of a single (possibly high-degree) univariate polynomial.6

Exact representations of these roots, known as algebraic numbers, can then be obtained byfactoring the polynomial into irreducible components, and then using algorithms basedon classical results to isolate each component’s real roots.7 These methods are explainedin Section 3 and Appendix A. With these exact solutions in hand, we can rigorouslyassess the rest points’ local stability through a linearization analysis. In order to obviatecertain intractable exact calculations, this analysis takes advantage of both an eigenvalueperturbation theorem and a bound on the condition number of a matrix that does notrequire the computation of its inverse (Appendix C).

The code used to obtain the exact and numerical results is available as a Mathematicanotebook posted on the authors’ websites. An online appendix provides background anddetails about both the exact and the numerical analyses, reports certain numerical resultsin full detail, and presents a few proofs omitted here.

Related literature

Previous work relating backward induction and deterministic evolutionary dynamicshas focused on the replicator dynamic of Taylor and Jonker (1978) and the best response

6See Buchberger (1965) and Cox et al. (2015). For applications of Grobner bases in economics, see Kubleret al. (2014).

7See von zur Gathen and Gerhard (2013), McNamee (2007), and Akritas (2010).

–5–

Page 6: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

dynamic of Gilboa and Matsui (1991) and Hofbauer (1995). Cressman and Schlag (1998)(see also Cressman (1996, 2003)) show that in generic perfect information games, everyinterior solution trajectory of the replicator dynamic converges to a Nash equilibrium.Likewise, Xu (2016) (see also Cressman (2003)) shows that in such games, every solutiontrajectory of the best response dynamic converges to a component of Nash equilibria.In both cases, the Nash equilibria approached need not be subgame perfect, and theNash equilibrium components generally are not locally stable. Focusing on the Centipedegame with three decision nodes, Ponti (2000) shows numerically that perturbed versionsof the replicator dynamic exhibit cyclical behavior, with trajectories approaching and thenmoving away from the Nash component. In contrast, we show that best experiencedpayoff dynamics lead to a stable distribution of cooperative strategies far from the Nashcomponent.

There are other deterministic dynamics from evolutionary game theory based on in-formation obtained from samples.8 Oyama et al. (2015) (see also Young (1993), Sandholm(2001), Kosfeld et al. (2002), and Kreindler and Young (2013)) study sampling best responsedynamics, under which revising agents observe the choices made by a random sample ofopponents, and play best responses to the distribution of choices in the sample. Drosteet al. (2003) show that with a sample size of 1 and uniform tie-breaking, the unique restpoint of this dynamic in Centipede is interior with linearly decreasing strategy weights.Uniform tie-breaking is essential for this conclusion: if the other tie-breaking rules we con-sider here are used instead, the unique rest point of the resulting dynamic is the backwardinduction state.

While both are based on random samples, sampling best response dynamics andbest experienced payoff dynamics differ in important respects. Sampling best responsedynamics not only require agents to understand the game’s payoff structure, but alsorequire them to observe the strategies that opponents play. But in extensive form gameslike Centipede, playing the game against an opponent need not reveal the opponent’sstrategy; for instance, an agent who stops play at the initial node learns nothing at all abouthis opponent’s intended play. In extensive form games, a basic advantage of dynamicsbased on experienced payoffs is that they only rely on information about opponents’intentions that agents can observe during play. As importantly for our results, agentsmodeled using best experienced payoff dynamics evaluate different strategies based onexperiences with different samples of opponents. As we noted earlier, this propertyfacilitates the establishment of cooperative behavior.

8In a related model of stochastic stability, Robson and Vega-Redondo (1996) assume that agents makechoices based on outcomes from a single matching of population members.

–6–

Page 7: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

Osborne and Rubinstein’s (1998) notion of S(k) equilibrium corresponds to the restpoints of the BEP dynamic under which agents test all strategies, subject each to k trials,and break ties via uniform randomization.9 While most of their analysis focuses onsimultaneous move games, they show that in Centipede games, the probability withwhich player 1 stops immediately in any S(1) equilibrium must vanish as the lengthof the game grows large. As we will soon see (Observation 2.1), this conclusion mayfail if uniform tie-breaking is not assumed, with the backward induction state being anequilibrium. Nevertheless, more detailed analyses below will show that this equilibriumstate is unstable under BEP dynamics.

Building on Osborne and Rubinstein (1998), Sethi (2000) introduces BEP dynamicsunder which all strategies are tested and ties are broken uniformly.10 He shows that bothdominant strategy equilibria and strict equilibria can be unstable under these dynamics,while dominated strategies can be played in stable equilibria. The latter fact is a basiccomponent of our analysis of cooperative behavior. Berkemer (2008) considers the localstability of the unique rationalizable strategy profile in the traveler’s dilemma of Basu(1994) under Sethi’s (2000) dynamics, obtaining a sufficient condition for the instability ofthe rationalizable state. He shows numerically that the stable S(1) equilibrium becomesindependent of the number of strategies in the game, and provides evidence from agent-based simulations that larger numbers of trials during testing can lead to cyclical behavior.Here we introduce a general model of multiple-sample dynamics, and we develop tech-niques for analyzing these dynamics that not only allow us to make tight predictions in theCentipede game, but also provide analytical machinery for understanding other dynamicmodels.

Earlier efforts to explain cooperative behavior in Centipede and related games havefollowed a different approach, applying equilibrium analyses to augmented versions ofthe game. The best known example of this approach is the work of Kreps et al. (1982).These authors modify the finitely repeated Prisoner’s Dilemma by assuming that oneplayer attaches some probability to his opponent having a fixed preference for cooperativeplay. They show that in all sequential equilibria of long enough versions of the resultingBayesian game, both players act cooperatively for a large number of initial rounds.11 To

9For extensions of S(k) equilibrium to more complex testing procedures, see Rustichini (2003).10Cardenas et al. (2015) and Mantilla et al. (2018) use these dynamics to explain stable non-Nash behavior

in public good games.11McKelvey and Palfrey (1992) show that this analysis extends to the Centipede game. A different aug-

mentation is considered by Jehiel (2005), who assumes that agents bundle decision nodes from contiguousstages into analogy classes, and view the choices at all nodes in a class interchangeably. Alternatively,following Radner (1980), one can consider versions of Centipede in which the stakes of each move aresmall, and analyze these games using ε-equilibrium; see Friedman and Oprea (2012) for a discussion. Butas Binmore (1998) observes, the existence of non-Nash ε-equilibrium depends on the relative sizes of the

–7–

Page 8: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

justify this approach, one must assume that the augmentation of the original game iscommonly understood by the players, that the players act in accordance with a rathercomplicated equilibrium construction, and that the equilibrium knowledge assumptionsrequired to justify sequential equilibrium apply. In contrast, our model makes no changesto the original game other than placing it in a population setting, and it is built upon theassumption that agents’ choices are optimal given their experiences during play.

2. Best experienced payoff dynamics in the Centipede Game

2.1 Normal form games and population games

A two-player normal form game G = {(S1,S2), (A,B)} is defined by pairs of strategy setsSp = {1, . . . , sp

} and payoff matrices A,B ∈ Rsp×sq , p, q ∈ {1, 2}, p , q. Ai j and Bi j represent

the two players’ payoffs when strategy profile (i, j) ∈ S1× S2 is played. When considering

extensive form games, our analysis focuses on the reduced normal form, whose strategiesspecify an agent’s “plan of action” for the game, but not his choices at decision nodes thatare ruled out by his own previous choices.

In our population model, members of two unit-mass populations are matched toplay a two-player game. A population state for population 1 is an element of X = {x ∈Rs1

+ :∑

i∈S1 xi = 1}, where xi is the fraction of population 1 players choosing strategy 1.Likewise Y = {y ∈ Rs2

+ :∑

i∈S2 yi = 1} is the set of population states for population 2. Thusx and y are formally equivalent to mixed strategies for players 1 and 2, and elements ofthe set Ξ = X × Y are formally equivalent to mixed strategy profiles. In a slight abuse ofterminology, we also refer to elements of Ξ as population states.

2.2 Revision protocols and evolutionary dynamics

To define evolutionary game dynamics, we follow the standard approach of specifyingmicrofoundations in terms of revision protocols.12 We suppose that at all times t ∈[0,∞), each agent has a strategy he uses when matched to play game G. The empiricaldistributions of these strategies are described by the population state ξ(t) = (x(t), y(t)).

Agents occasionally receive opportunities to switch strategies according to indepen-dent rate 1 Poisson processes. An agent who receives an opportunity considers switching

stakes and of ε, and the backward induction solution always persists as a Nash equilibrium, and hence asan ε-equilibrium.

12See Bjornerstedt and Weibull (1996), Weibull (1995), Sandholm (2010a,b, 2015), and Izquierdo et al.(2018).

–8–

Page 9: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

to a new strategy, making his decision by applying a revision protocol. Formally, a revisionprotocol for population 1 is described by a map (A, y) 7→ σ1(A, y) ∈ Xs1 that assigns ownpayoff matrices and opposing population states to matrices of conditional switch probabili-ties, where σ1

i j(A, y) is the probability that an agent playing strategy i ∈ S1 who receives arevision opportunity switches to strategy j ∈ S1. Likewise, a revision protocol for popu-lation 2 is described by a map (B, x) 7→ σ2(B, x) ∈ Ys2 with an analogous interpretation.

If population sizes are large but finite, then a game and a pair of revision protocolsdefines a Markov process on the set Ξ of population states. During each short intervalof time, a large number of agents in each population receive opportunities to switchstrategies, applying the protocols σp to decide which strategy to choose next. But since thefraction of each population receiving revision opportunities during the interval is small,so is the change in the state over the interval. Intuition from the law of large numbersthen suggests that over this interval, and over concatenations of such intervals, the stateshould evolve in an almost deterministic fashion, following the trajectory defined by theexpected motion of the process. This claim is made rigorous by Benaım and Weibull(2003), who show that over finite time spans, the stochastic evolution of play is very likelyto hew closely to the solution of a differential equation. This differential equation, calledthe mean dynamic, describes the expected motion of the populations from each state:

(1)

xi =∑j∈S1

x jσ1ji(A, y) − xi for all i ∈ S1,

yi =∑j∈S2

y jσ2ji(B, x) − yi for all i ∈ S2.

Equation (1) is easy to interpret. Since revision opportunities are assigned to agentsrandomly, there is an outflow from each strategy i proportional to its current level ofuse. To generate inflow into i, an agent playing some strategy j must receive a revisionopportunity, and applying his revision protocol must lead him to play strategy i.

Outside of monomorphic (i.e. pure) cases, the rest points of the dynamic (1) shouldnot be understood as equilibria in the traditional game-theoretic sense. Rather, theyrepresent situations in which agents perpetually switch among strategies, but with theexpected change in the use of each strategy equaling zero.13 At states that are locally stableunder the dynamic (1), fluctuations in any direction are generally undone by the action of(1) itself. Contrariwise, fluctuations away from unstable equilibria are reinforced, so weshould not expect such states to be observed.

13Thus in the finite-population version of the model, variations in the use of each strategy would beobserved. For a formal analysis, see Sandholm (2003).

–9–

Page 10: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

2.3 Best experienced payoff protocols and dynamics

We now introduce the class of revision protocols and dynamics that we study in thispaper. A best experienced payoff protocol is defined by a triple (τ, κ, β) consisting of a test setrule τ, a number of trials κ, and a tie-breaking rule β. The triple (τ, κ, β) defines a revisionprotocol in the following way. When an agent currently using strategy i ∈ Sp receivesan opportunity to switch strategies, he draws a set of strategies Rp

⊆ Sp to test accordingto the distribution τp on the collection of subsets of Sp with at least two elements. Hethen plays each strategy in Rp in κ random matches against members of the opposingpopulation. He thus engages in #Rp

× κ random matches in total, facing distinct sets ofopponents when testing different strategies. The agent then selects the strategy in Rp thatearned him the highest total payoff, breaking ties according to rule βp.

Our analysis here focuses on two test set rules: test-all, τall, under which a revising agenttests all of his strategies; and test-two, τtwo, under which he tests two strategies: his currentone, and another one chosen uniformly at random from those that remain. We considerthree tie-breaking rules: min-if-tie, βmin, which chooses the lowest-numbered optimal strat-egy; stick/min-if-tie, βstick, which chooses the agent’s current strategy if it is optimal, andchooses the lowest-numbered strategy otherwise, and uniform-if-tie, βunif, which random-izes uniformly among the optimal strategies. In normal form games, tie-breaking rulesonly matter in nongeneric cases. But in extensive form games like Centipede, differentstrategies often earn the same payoffs, giving tie-breaking rules greater importance.

The procedure defined by the triple (τ, κ, β) defines a revision protocol σp for eachpopulation p. Inserting these revision protocols into equation (1) defines a best experiencedpayoff dynamic. For instance, if the test set rule is τall and if the tie-breaking rule is βunif,then the rest points of the dynamic (1) are the S(k) equilibria of Osborne and Rubinstein(1998) (with k = κ), and the dynamic itself is the one studied by Sethi (2000). Other choicesof τ and β lead to novel population dynamics.14

Choice probabilities under best experienced payoff dynamics depend only on thepayoffs strategies earn during testing; they do not require agents to track the choicesmade by their opponents. This property makes the dynamics appealing as a simplemodel of play for extensive form games. Typically, a single play of an extensive formgame does not reveal the strategy chosen by one’s opponent, but only the portion of thatstrategy required to determine the path of play. Consequently, it is not straightforwardto specify how agents should use their experience of play to assess opponents’ choices ofstrategies. Because they focus on the performances of own strategies, best experienced

14We give a brief formal account of best experienced payoff protocols and dynamics in Online AppendixIX, and provide a complete treatment in Sandholm et al. (2018).

–10–

Page 11: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

1

0,0

2

-1,3

1

2,2

2

1,5

1

4,4

8,82

3,7

1

6,6

2

5,9

Figure 1: The Centipede game of length d = 8.

payoff dynamics avoid such ambiguities.

2.4 The Centipede game

Centipede (Rosenthal (1981)) is a two-player extensive form game with d ≥ 2 decisionnodes (Figure 1). Each node presents two actions, stop and continue. The nodes arearranged linearly, with the first one assigned to player 1 and subsequent ones assignedin an alternating fashion. A player who stops ends the game. A player who continuessuffers a cost of 1 but benefits his opponent 3, and sends the game to the next decisionnode if one exists.

In a Centipede game of length d, players 1 and 2 have d1 = bd+12 c and d2 = bd

2c decisionnodes, respectively. Thus player p has sp = dp + 1 strategies, where strategy i < sp is theplan to continue at his first i − 1 decision nodes and to stop at his ith decision node, andstrategy sp is the plan to continue at all dp of his decision nodes. Of course, the portionof a player’s plan that is actually carried out depends on the plan of his opponent. Thepayoff matrices (A,B) of Centipede’s reduced normal form can be expressed concisely as

(2) (Ai j,Bi j) =

(2i − 2, 2i − 2) if i ≤ j,

(2 j − 3, 2 j + 1) if j < i.

It will sometimes be convenient to number strategies starting from the end of the game.To do so, we write [k] ≡ sp

− k for k ∈ {0, . . . , dp}, so that [0] denotes continuing at all nodes,

and [k] with k ≥ 1 denotes stopping at player p’s kth-to-last node.We noted above that best experienced payoff dynamics with κ = 1 only depend on

ordinal properties of payoffs. In this case, what matters in (2) is that a player is betteroff continuing at a given decision node if and only if his opponent will continue at thesubsequent decision node. If the cost of continuing is 1, this property holds as long asthe benefit obtained when one’s opponent continues exceeds 2. This ordering of payoffs

–11–

Page 12: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

also holds for typical specifications in which total payoffs grow exponentially over time.15

When there are multiple trials of each tested strategy (κ = 1), then cardinal properties ofpayoffs matter; in this case, Rosenthal’s (1981) specification (2) keeps the potential benefitsfrom continuing relatively modest.

The backward induction solution to Centipede has both players stop at each of theirdecision nodes. We will thus call the population state ξ† = (x†, y†) ∈ Ξ with x†1 = y†1 = 1 the(reduced) backward induction state. It is well known that all Nash equilibria of Centipedehave player 1 stop at his initial node. This makes player 2 indifferent among all of herstrategies, so Nash equilibrium requires that she choose a mixed strategy that makesstopping immediately optimal for player 1.

Of course, these predictions require assumptions about what the players know. Inthe traditional justification of Nash equilibrium, players are assumed to correctly an-ticipate opponents’ play. Likewise, traditional justifications of the backward inductionsolution require agents to maintain common belief in rational future play, even if behaviorcontradicting this belief has been observed in the past.

2.5 Best experienced payoff dynamics for the Centipede game

We now introduce best experienced payoff dynamics in the Centipede game, statingand interpreting formulas for two BEP(τ, 1, β) dynamics.16 The BEP(τall, 1, βmin) dynamicfor Centipede is given by17

xi =

s2∑k=i

yk

i∑

m=1

ym

s1−i

+

i−1∑k=2

yk

k−1∑`=1

y`

i−k k∑

m=1

ym

s1−i

− xi,(3a)

y j =

s1∑k=2

xk

(x1 + x2)s2−1 + (x1)s2

− y1 if j = 1, s1∑k= j+1

xk

j+1∑

m=1

xm

s2− j

+

j∑k=2

xk

k−1∑`=1

x`

j−k+1 k∑

m=1

xm

s2− j

− y j otherwise.

(3b)

Under test-all with min-if-tie, the choice made by a revising agent does not depend

15Suppose initial total payoffs are positive, that total payoffs are multiplied by b > 1 in each period, andthat the player who was not the last to continue receives fraction p ∈ ( 1

2 , 1) of this total. Then continuing isnot dominant when p

1−p > b, and the ordinal structure described above is obtained when p1−p < b3.

16Explicit formulas for the remaining BEP(τ, 1, β) dynamics for Centipede with τ ∈ {τall, τtwo} and β ∈

{βmin, βstick, βunif} are presented in Online Appendix XI, and formulas for BEP(τ, κ, βmin) dynamics are given

in Appendix E.17We follow the convention that a sum whose lower limit exceeds its upper limit evaluates to 0.

–12–

Page 13: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

on his original strategy. The first two terms of (3a) describe the two types of matchingsthat lead a revising agent in the role of player 1 to choose strategy i. First, it could be thatwhen the agent tests i, his opponent plays i or higher (so that the agent is the one to stopthe game), and that when the agent tests higher strategies, his opponents play strategies ior lower. In this case, only strategy i yields the agent his highest payoff. Second, it couldbe that when the agent tests i, his opponent plays strategy k < i; when he tests strategiesbetween k and i− 1, his opponents play strategies less than k; and when he tests strategiesabove i, his opponents play strategies less than or equal to k. In this case, strategy i is thelowest strategy that achieves the optimal payoff, and so is chosen by the revising agentunder the min-if-tie rule. Similar logic, and accounting for the fact that player 2’s jth nodeis followed by player 1’s ( j + 1)st node, leads to equation (3b).

The BEP(τtwo, 1, βstick) dynamic for Centipede is written as

xi =1

s1 − 1

∑h,i

s2∑

k=h+1|i

s2|i∑

`=1

yky` +

h|i−1∑k=2

k−1∑`=1

yky`

(xi + xh) +

h−1|i−1∑k=1

(yk)2xi

− xi,(4a)

y j =1

s2 − 1

∑h, j

s1∑

k=h+2| j+1

s1| j+1∑`=1

xkx` +

h+1| j∑k=2

k−1∑`=1

xkx`

(y j + yh

)+

h| j∑k=1

(xk)2 y j

− y j.(4b)

In the summations in (4a), the notation L− |L+ should be read as “if h < i use L− as the limit;if h > i use L+ as the limit”; likewise for those in (4b), but with h now compared to j.18

Each term in the brackets in (4a) represents a comparison between the performances ofstrategies i and h. Terms that include xi + xh represent cases in which the realized payoff

to i is larger than that to h, so that it does not matter whether the revising agent is an iplayer who tests h or vice versa. The terms with xi alone represent cases of payoff ties,which arise when i and h are both played against opponents choosing the same strategyk < i ∧ h that stops before either i or h; in this case, the stick-if-tie rule says that the agentshould play i only if he had previously been doing so.19

We conclude this section with a simple observation about the backward induction

18We could replace | by ∧ (the min operator) in all cases other than the two that include s2 or s1.19To understand the functional form of (4a), consider a revising agent with test set {i, h}. If i > h, the initial

double sum represents matchings in which i is played against an opponent choosing a strategy above h; ifi < h, it represents matchings in which i is played against an opponent choosing strategy i or higher, whileh is played against an opponent choosing strategy i or lower. The second double sum represents matchingsin which i is played against an opponent choosing strategy h ∧ (i − 1) or lower, while h is played against anopponent choosing a still lower strategy. In all of these cases, i yields a larger payoff than h, so the revisingagent selects i regardless of what he was initially playing. The final sum represents matchings in which iand h are both played against opponents who choose the same strategy k < h ∧ i, leading the agent to stickwith his original strategy i.

–13–

Page 14: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

solution of Centipede under best experienced payoff dynamics.

Observation 2.1. Under any BEP(τ, κ, β) dynamic for which β is the min-if-tie rule βmin or thestick/min-if-tie rule βstick, the backward induction state ξ† is a rest point.

Osborne and Rubinstein (1998) show that if all strategies are tested once and ties arebroken uniformly, then in a long Centipede game, stationarity requires that play is almostnever stopped at the initial node. Observation 2.1 shows that this conclusion depends onthe assumption that ties are broken uniformly. If instead ties are broken in favor of anagents’ current strategy or the lowest-numbered strategy, then the backward inductionstate is a rest point. Even so, more refined analyses in Sections 4 and 5 will explain whythe backward induction state is not a compelling prediction of play even under thesetie-breaking rules.

3. Exact solutions of systems of polynomial equations

An important first step in analyzing a system of differential equations is to identify itsrest points. From this point of view, a key feature of best experienced payoff dynamicsis that they are defined by systems of multivariate polynomial equations with rationalcoefficients. The zeros of these systems in the state space Ξ are the rest points of thedynamics. This fact allows us to use methods from computational algebra to obtain exactrepresentations of the rest points. This is important not only for its own sake, but alsobecause it enables us to perform rigorous stability analyses of these rest points usingsuitable approximations from matrix algebra (see Section 4.1 and Appendix C).

Starting from the collection of multivariate polynomials that defines a BEP dynamic,we construct a new collection of polynomials that has the same set of zeros, but whosesolutions are easier to obtain. The new collection, called a Grobner basis, is the polynomialanalogue of the result of Gaussian elimination in linear systems. The zeros of the Grobnerbasis can be found by solving for the roots of a single univariate polynomial, and thensubstituting backward to turn the next elements of the basis into univariate polynomials.

It remains to find the roots of these univariate polynomials. Here there are well-knownlimits to what can be accomplished: Abel’s theorem states that there is no solution inradicals to general univariate polynomial equations of degree five or higher. Nevertheless,tools from computational algebra allow us to represent such solutions exactly. To start,one uses a polynomial factorization algorithm to represent the original polynomial asa product of irreducible polynomials. The roots of these irreducible polynomials, whichare algebraic numbers, are labeled in a way that allows us to distinguish them from one

–14–

Page 15: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

another, and to perform arithmetic operations in an efficient manner, even though theexact decimal values of these numbers are inaccessible. We employ such a representationwhenever we use the symbol

√2, which labels the positive solution to x2

− 2 = 0, andwhich obeys some basic operations (e.g., that (

√2)2 = 2), even as any decimal value we

write for this symbol is only an approximation.Appendix A provides detailed explanations of Grobner bases and of the representation

of solutions to univariate polynomials as algebraic numbers, as well as examples of theirapplication to BEP dynamics. Computational limitations of this approach are discussedin Section 5.2; see especially Table 6. Although the sections to follow do not require it,some readers may find it useful to go through Appendix A before proceeding.

4. Analysis of test-all, min-if-tie dynamics

In this section and Section 5.1, we analyze BEP(τ, 1, βmin) dynamics in Centipede games.Since tie-breaking rule βmin selects the optimal strategy that stops soonest, it is the tie-breaking rule that is most favorable toward backward induction. Before proceeding, wereview some standard definitions and results from dynamical systems theory, and followthis with a simple example.

Consider a C1 differential equation ξ = V(ξ) defined on Ξ whose forward solutions{x(t)}t≥0 do not leave Ξ. State ξ∗ is a rest point if V(ξ∗) = 0, so that the unique solutionstarting from ξ∗ is stationary. Rest point ξ∗ is Lyapunov stable if for every neighborhoodO ⊂ Ξ of ξ∗, there exists a neighborhood O′ ⊂ Ξ of ξ∗ such that every forward solution thatstarts in O′ is contained in O. If ξ∗ is not Lyapunov stable it is unstable, and it is repellingif there is a neighborhood O ⊂ Ξ of ξ∗ such that solutions from all initial conditions inO r {ξ∗} leave O.

Rest point ξ∗ is attracting if there is a neighborhood O ⊂ Ξ of ξ∗ such that all solutionsthat start in O converge toξ∗. A state that is Lyapunov stable and attracting is asymptoticallystable. In this case, the maximal (relatively) open set of states from which solutionsconverge to ξ∗ is called the basin of ξ∗. If the basin of ξ∗ contains int(Ξ), we call ξ∗ almostglobally asymptotically stable; if it is Ξ itself, we call ξ∗ globally asymptotically stable.

The C1 function L : O → R+ is a strict Lyapunov function for rest point ξ∗ ∈ O ifL−1(0) = {ξ∗}, and if its time derivative L(ξ) ≡ ∇L(ξ)′V(ξ) is negative on Or {ξ∗}. Standardresults imply that if such a function exists, then ξ∗ is asymptotically stable.20 If L is a strictLyapunov function for ξ∗ with domain O = Ξ r {ξ†} and ξ† is repelling, then ξ∗ is almostglobally asymptotically stable; if the domain is Ξ, then ξ∗ is globally asymptotically stable.

20See, e.g., Sandholm (2010b, Appendix 7.B).

–15–

Page 16: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

Example 4.1. As a preliminary, we consider BEP(τ, 1, βmin) dynamics for the Centipedegame of length 2. Since each player has two strategies, all test-set rules τ have revisingagents test both of them. Focusing on the fractions of agents choosing to continue, we canexpress the dynamics as

(5)x2 = y2 − x2,

y2 = x2x1 − y2.

By way of interpretation, a revising agent in population 1 chooses to continue if hisopponent when he tests continue also continues. A revising agent in population 2 choosesto continue if her opponent continues when she tests continue, and her opponent stopswhen she tests stop.21

Writing 1 − x2 for x1 in (5) and then solving for the zeros, we find that the unique restpoint of (5) is the backward induction state: x†2 = y†2 = 0. Moreover, defining the functionL : [0, 1]2

→ R+ by L(x2, y2) = 12 ((x2)2 + (y2)2), we see that L−1(0) = {ξ†} and that

L(x2, y2) = x2x2 + y2 y2 = x2y2 − (x2)2 + y2x2 − y2(x2)2− (y2)2 = −(x2 − y2)2

− y2(x2)2,

which is nonpositive on [0, 1]2 and equals zero only at the backward induction state. SinceL is a strict Lyapunov function for ξ† on Ξ, state ξ† is globally asymptotically stable. _

In light of this example, our analyses of dynamics using the min-if-tie rule βmin willfocus on Centipede games of lengths d ≥ 3. In the remainder of this section we supposethat τ is the test-all rule τall. Section 5.1 considers the test-set rule τtwo, and Section 5.2considers the tie-breaking rules βstick and βunif (cf. Section 2.3).

4.1 Rest points and local stability

4.1.1 Analytical results

As we know from Observation 2.1, the backward induction state ξ† of the Centipedegame is a rest point of the BEP(τall, 1, βmin) dynamic. Our first result shows that this restpoint is always repelling.

Proposition 4.2. In Centipede games of lengths d ≥ 3, the backward induction state ξ† is repellingunder the BEP(τall, 1, βmin) dynamic.

The proof of Proposition 4.2, which is presented in Appendix B, is based on a somewhatnonstandard linearization argument. While we are directly concerned with the behavior

21Compare the discussion after equation (3) and Example 4.3 below.

–16–

Page 17: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

of the BEP dynamics on the state space Ξ, it is useful to view equation (1) as definingdynamics throughout the affine hull aff(Ξ) = {(x, y) ∈ Rs1+s2 :

∑i∈S1 xi =

∑j∈S2 y j = 1},

which is then invariant under (1). Vectors of motion through aff(Ξ) are elements of thetangent space TΞ = {(z1, z2) ∈ Rs1+s2 :

∑i∈S1 z1

i =∑

j∈S2 z2j = 0}. Note that TΞ is a subspace of

Rs1+s2 , and that aff(Ξ) is obtained from TΞ via translation: aff(Ξ) = TΞ + ξ†.A standard linearization argument is enough to prove that ξ† is unstable. Let the

vector field V : aff(Ξ) → TΞ be defined by the right-hand side of (1). To start the proof,we obtain an expression for the derivative matrix DV(ξ†) that holds for any game lengthd. We then derive formulas for the d linearly independent eigenvectors of DV(ξ†) in thesubspace TΞ and for their corresponding eigenvalues. We find that d−1 of the eigenvaluesare negative, and one is positive. The existence of the latter implies that ξ† is unstable.

To prove that ξ† is repelling, we show that the hyperplane through ξ† defined by thespan of the set of d − 1 eigenvectors with negative eigenvalues supports the convex statespace Ξ at state ξ†. Results from dynamical systems theory—specifically, the Hartman-Grobman and stable manifold theorems (Perko (2001, Sec. 2.7–2.8))—then imply that insome neighborhood O ⊂ aff(Ξ) of ξ†, the set of initial conditions from which solutionsconverge to ξ† is disjoint from Ξ r {ξ†}, and that solutions from the remaining initialconditions eventually move away from ξ†.

The following example provides intuition for the instability of the backward inductionstate; the logic is similar in longer games and for other specifications of the dynamics.

Example 4.3. In a Centipede game of length d = 4, writing out display (3) shows that theBEP(τall, 1, βmin) dynamic is described by

x1 = (y1)2− x1, y1 = (x2 + x3)(x1 + x2)2 + (x1)3

− y1,

x2 = (y2 + y3)(y1 + y2) − x2, y2 = x3 + x2x1(x1 + x2) − y2,(6)

x3 = y3 + y2y1 − x3, y3 = x2(x1)2 + x3(x1 + x2) − y3.

The linearization of this system at (x†, y†) has the positive eigenvalue 1 correspondingto eigenvector (z1, z2) = ((−2, 1, 1), (−2, 1, 1)) (see equation (21)). Thus at state (x, y) =

((1 − 2ε, ε, ε), (1 − 2ε, ε, ε)) with ε > 0 small, we have (x, y) ≈ ((−2ε, ε, ε), (−2ε, ε, ε)).To understand why the addition of agents in both populations playing the cooperative

strategies 2 and 3 is self-reinforcing, we build on the discussion following equation (3).Consider, for instance, component y3, which represents the fraction of agents in population2 who continue at both decision nodes. The last expression in (6) says that a revisingpopulation 2 agent switches to strategy 3 if (i) when testing strategy 3 she meets anopponent playing strategy 2, and when testing strategies 1 and 2 she meets opponents

–17–

Page 18: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

playing strategy 1, or (ii) when testing strategy 3 she meets an opponent playing strategy3, and when testing strategy 2 she meets an opponent playing strategy 1 or 2. These eventshave total probability ε(1 − ε) + ε(1 − 2ε)2

≈ 2ε. Since there are y3 = ε agents currentlyplaying strategy 3, outflow from this strategy occurs at rate ε. Combining the inflow andoutflow terms shows that y3 ≈ 2ε − ε = ε. Analogous arguments explain the changes inthe values of the other components of the state. _

It may seem surprising that the play of a weakly dominated strategy—continuing bythe last mover at the last decision node—is positively reinforced at an interior populationstate. This is possible because revising agents test each of their strategies against newlydrawn opponents: as just described, a revising population 2 agent will choose to continueat both of her decision nodes if her opponent’s strategy when she tests strategy 3 ismore cooperative than her opponents’ strategies when she tests her own less cooperativestrategies.

If population 2 agents understand that they are playing Centipede, one might appealto the principle of admissibility to argue that they should not choose to continue at theirlast decision node. One can capture this principle in our context by modeling play in thelength d Centipede game using the dynamics we have defined for the length d− 1 game.22

Since the backward induction state is unstable, we next look for attractors to which thedynamics may converge.

Proposition 4.4. For Centipede games of lengths 3 ≤ d ≤ 6, the BEP(τall, 1, βmin) dynamic hasexactly two rest points, ξ† and ξ∗ ∈ int(Ξ). The rest point ξ∗, whose exact components are known,is asymptotically stable.

Our proof that the dynamics considered in the proposition have exactly two rest pointsuses the Grobner basis and algebraic number algorithms described in Section 3. Exactsolutions can only be obtained for games of length at most 6 because of the computationaldemands of computing the Grobner bases. One indication of these demands is that whend = 6, the univariate polynomial from the Grobner basis is of degree 221.23

Table 1 reports the approximate values of the interior rest points ξ∗, referring tostrategies using the last-to-first notation [k] introduced in Section 2.4. Evidently, themasses on each strategy are nearly identical for games of lengths 4, 5, and 6, with nearlyall of the weight in both populations being placed on continuing to the end, stopping atthe last node, or stopping at the penultimate node.

22Because of the logical complications inherent in the foundations for iterated admissibility (Samuelson(1992), Brandenburger et al. (2008)), we hesitate to consider iterative truncation of our game dynamics.

23Under the test-two rule τtwo, games of lengths 7 and 8 can also be handled analytically. See Table 6 fora summary.

–18–

Page 19: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

population p population q[3] [2] [1] [0] [3] [2] [1] [0]

d = 3 .618034 .381966 .381966 .381966 .236068d = 4 .113625 .501712 .384663 .337084 .419741 .243175d = 5 .113493 .501849 .384658 .001462 .335672 .419706 .243160d = 6 3.12 × 10−9 .113493 .501849 .384658 .001462 .335672 .419706 .243160

Table 1: “Exact” interior rest points ξ∗ of the BEP(τall, 1, βmin) dynamic. p denotes the owner of thepenultimate decision node, q the owner of the last decision node.

d = 3 −1 ± .3820 −1d = 4 −1.1411 ± .3277 i −.8589 ± .3277 id = 5 −1.1355 ± .3284 i −.8645 ± .3284 i −1d = 6 −1.1355 ± .3284 i −.8645 ± .3284 i −1 ± 9.74 × 10−5 i

Table 2: Eigenvalues of the derivative matrices DV(ξ∗) of the BEP(τall, 1, βmin) dynamic.

In principle, it is possible to prove the local stability of the rest points ξ∗ using lineariza-tion. But since the components of ξ∗ are algebraic numbers, computing the eigenvaluesof DV(ξ∗) requires finding the exact roots of a polynomial with algebraic coefficients,a computationally intensive problem. Fortunately, we can prove local stability withoutdoing so. Instead, we compute the eigenvalues of the matrix DV(ξ), where ξ is a rationalpoint that is very close to ξ∗, showing that these eigenvalues all have negative real part.Proposition C.1 in Appendix C establishes an upper bound on the distances between theeigenvalues of DV(ξ) and DV(ξ∗). Importantly, this bound can be evaluated without hav-ing to compute the roots of a polynomial with algebraic coefficients or to invert a matrixwith algebraic components, as both of these operations quickly become computationallyinfeasible. Combining these steps allows us to conclude that the eigenvalues of DV(ξ∗)also have negative real part. For a detailed presentation of this argument, see AppendixC.

The approximate eigenvalues of DV(ξ∗) are reported in Table 2. Note that the eigenval-ues for games of length 5 and 6 are nearly identical, with the replacement of an eigenvalueof −1 by a pair of complex eigenvalues that are very close to −1.

4.1.2 Numerical results

Because exact methods only allow us to determine the rest points of the BEP(τall, 1, βmin)dynamic in Centipede games of lengths d ≤ 6, we use numerical methods to study gamesof lengths 7 through 20. We find that in all cases there are two rest points, the backward

–19–

Page 20: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

3 4 5 6 7 8 9 10 ... 20

[0]

0.0

0.2

0.4

0.6

0.8

1.0

d

[1]

[2]

(i) penultimate mover

3 4 5 6 7 8 9 10 ... 20

[0]

0.0

0.2

0.4

0.6

0.8

1.0

d

[1]

[2]

(ii) last mover

Figure 2: The stable rest point of Centipede under the BEP(τall, 1, βmin) dynamic for game lengthsd = 3, . . . , 10 and d = 20. Stacked bars, from the bottom to the top, represent weights on strategy [0]

(continue at all decision nodes), [1] (stop at the last node), [2] (stop at the second-to-last node), etc. Thedashed line separates exact (d ≤ 6) and numerical (d ≥ 7) results.

induction state ξ†, and an interior rest point ξ∗. As Figure 2 illustrates, the form of theinterior rest point follows the pattern from Table 1: regardless of the length of the game,nearly all of the mass is placed on each population’s three most cooperative strategies,and the weights on these strategies are essentially independent of the length of the game.Precise numerical estimates of these rest points are provided in Online Appendix V, andnumerical estimates of the eigenvalues of the derivative matrices DV(ξ∗) are presented inOnline Appendix VI. The latter are essentially identical to those presented in Table 2 ford = 6, with the addition of an eigenvalue of ≈ −1 for each additional decision node.

These numerical results strongly suggest that the conclusions about rest points estab-lished analytically for games of lengths d ≤ 6 continue to hold for longer games: there arealways exactly two rest points, the backward induction state ξ†, and an stable interior restpoint ξ∗ whose form barely varies with the length of the game.

4.1.3 Intuition

We now explain why the stable rest point ξ∗ = (x∗, y∗) of the BEP(τall, 1, βmin) dynamicconcentrates its mass on each population’s three most cooperative strategies, and considerthe effects of other test-set rules on which strategies will be played.

Revising agents are least likely to choose cooperative strategies when few opponentsare already cooperating. Thus cooperative strategies should have the most difficulty gain-ing ground near the backward induction state ξ†. But our arguments above (cf. Example4.3) explain why a small invasion of cooperative strategies at ξ† is self-reinforcing. Once

–20–

Page 21: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

such strategies have established a toehold, it is easy for them to become more prevalent:when a player tests a cooperative strategy against a cooperative opponent, this strategyis likely to have performed best.

To understand why each player’s three most cooperative strategies predominate, weexamine the test results that lead to each strategy being chosen. We assume for concrete-ness that d is even, so that we can call agents in the player 1 and player 2 roles “firstmovers” and “last movers” respectively.

The first two terms of equation (3b) give the total probability of test results that wouldlead a last mover to play strategy j. The terms differ in how play ended when this agenttested strategy j: in the first term, this test ended with the agent stopping the game at hisjth decision node; in the second term, the opponent stopped the game at an earlier node.

Now suppose that nearly all first movers play one of their three most cooperativestrategies. Then the choice probabilities of last movers are determined by the terms from(3b) whose multiplicands all contain at least one of x[0], x[1], and x[2]. Such terms only existfor the last mover’s four most cooperative strategies:

Pr( j = [0]) ≈ x[0](x[d1] + · · · + x[1]) + x[1](x[d1] + · · · + x[2])2,

Pr( j = [1]) ≈ x[0] + x[1](x[d1] + · · · + x[2]) (x[d1] + · · · + x[1]),

Pr( j = [2]) ≈ (x[1] + x[0]) (x[d1] + · · · + x[1])2,(7)

Pr( j = [3]) ≈ (x[2] + x[1] + x[0]) (x[d1] + · · · + x[2])3.(8)

When a last mover tests strategy [0], his opponent necessarily stops before him, so theprobability that he chooses [0] comes from the second term in (3b). Examining this termreveals that there are two likely ways in which strategy [0] can have the optimal test result:if it is tested against a first mover playing [0] and strategy [1] is not, or if it is tested againsta first mover playing strategy [1] and strategies [1] and [2] are tested against first moversplaying less cooperative strategies. When population 1’s state is x∗, then the contributionsof x∗[0], x∗[1], and x∗[2] determine y∗[0] precisely:

x∗[0](x∗[2] + x∗[1]) + x∗[1](x

∗[2])

2≈ (.384658)(.615342) + (.501849)(.113493)2

≈ .243160 ≈ y∗[0].

There are two likely ways in which a last mover’s strategy [1] can obtain the optimaltest result, one for each term from (3b): if it is tested against [0], so that it ends the gamewith the maximal payoff, or if is tested against [1] (and so does not end the game), strategy[2] is tested against a less cooperative strategy, and strategy [0] is not tested against themost cooperative strategy. At x∗ the probability of a last mover choosing [1] is

–21–

Page 22: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

x∗[0] + x∗[1]x∗[2](x∗[2] + x∗[1]) ≈ .384658 + (.501849)(.113493)(.615342) ≈ .419706 ≈ y∗[1].

The remaining last-mover strategies are only likely to have the optimal test result whenthey end the game and more cooperative strategies do not earn a higher payoff. At x∗ theprobabilities that strategies [2] or [3] are chosen are

(x∗[1] + x∗[0]) (x∗[2] + x∗[1])2≈ (.886507)(.615342)2

≈ .335672 ≈ y∗[2] and(9)

1 · (x∗[2])3≈ (.113493)3

≈ .001462 ≈ y∗[3].

Equations (7) and (9) show that even if first movers only played strategies [0] and[1], last movers would still play strategy [2] frequently, suggesting why both populationsmust use at least their three most cooperative strategies at the stable state.

Contrariwise, the cubic term in equation (8) impedes the sustained use of more thanthree strategies. Indeed, the less cooperative is strategy j, the more strategies there are thatare more cooperative than j, and for each such strategy k, there are fewer test results underwhich k fails to outperform j. Because of these two factors, the masses on less-cooperativestrategies at the stable rest point of the BEP(τall, 1, βmin) dynamic drop extremely rapidly.Under test-set rule τtwo, a less-cooperative strategy can be chosen so long as it performsas well as (at most) one more-cooperative strategy. Because of this, the masses on less-cooperative strategies at the stable rest point do not decay at as dramatic a pace. SeeSection 5.1 and Online Appendix V for details.

4.2 Global convergence

The facts that the vertex ξ† is repelling, the interior rest point ξ∗ is attracting, and theseare the only two rest points give us a strong reason to suspect that state ξ∗ attracts allsolutions of the BEP(τall, 1, βmin) dynamic other than the stationary solution at ξ†.24 In thissection, we argue that this is indeed the case.

To start, we prove that excepting the rest point ξ†, the boundary of the state space isrepelling.

Proposition 4.5. In Centipede games of all lengths d ≥ 3, solutions to the BEP(τall, 1, βmin)dynamic from every initial condition ξ ∈ bd(Ξ) r {ξ†} immediately enter int(Ξ).

24For there to be other solutions that did not converge to ξ∗ without the dynamics having another restpoint, the flow of the dynamics would need to have very special topological properties. For instance, in atwo-dimensional setting, this could occur if ξ∗ were contained in a pair of concentric closed orbits, the innerrepelling and the outer attracting.

–22–

Page 23: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

Of course, since the rest point ξ∗ is very close to the boundary for even moderate gamelengths d, the distance of motion from the boundary from some initial conditions must bevery small.

The proof of Proposition 4.5, which is presented in Appendix D, starts with a simpledifferential inequality (Lemma D.1) that lets us obtain explicit positive lower bounds onthe use of any initially unused strategy i at times t ∈ (0,T]. The bounds are given in termsof the probabilities of test results that lead i to be chosen, and thus, backing up one step,in terms of the usage levels of the opponents’ strategies occurring in those tests (equation(49)). With this preliminary result in hand, we prove inward motion from ξ , ξ† byconstructing a sequence that contains all unused strategies, and whose kth strategy couldbe chosen by a revising agent after a test result that only includes strategies that wereinitially in use or that appeared earlier in the sequence. Variations on this argumentestablish inward motion for the other BEP dynamics we consider.

To argue thatξ∗ is almost globally stable we introduce the candidate Lyapunov function

(10) L(x, y) =

s1∑i=2

(xi − x∗i )2 +

s2∑j=2

(y j − y∗j )2.

In words, L(x, y) is the squared Euclidean distance of (x, y) from (x∗, y∗) if the points in thestate space Ξ are represented in Rd by omitting the first components of x and y.

The Grobner basis techniques from Section 3 do not allow for inequality constraints—here, the nonnegativity constraints on the components of the state—and so are not suitablefor establishing that L is a Lyapunov function.25 We therefore evaluate this claim numeri-cally. For games of lengths 4 through 20, we chose one billion (109) points from the statespace Ξ uniformly at random, and evaluated a floating-point approximation of L at eachpoint. In all instances, the approximate version of L evaluated to a negative number. Thisnumerical procedure covers the state space fairly thoroughly for the game lengths weconsider,26 and so provides strong numerical evidence that the interior rest point ξ∗ is analmost global attractor, and also about the global structure of the dynamics.

25In principle, we could verify that L is a Lyapunov function using an algorithm from real algebraic ge-ometry called cylindrical algebraic decomposition (Collins (1975)), but exact implementations of this algorithmfail to terminate once the dimension of the state space exceeds 2.

26By a standard combinatoric formula, the number of states in a grid in Ξ = X × Y with mesh 1m is(m+s1

−1m

)(m+s2−1

m). Applying this formula shows for a game of length 10, 109 is between the numbers of states

in grids in Ξ of meshes 117 (since

(2217)2

= 693,479,556) and 118 (since

(2318)2

= 1,132,255,201). For a game of length15, the comparable meshes are 1

10 and 111 , and for length 20, 1

7 and 18 .

–23–

Page 24: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

5. Analyses of other test-set and tie-breaking rules

5.1 Test-two

We have focused on tie-breaking rule βmin because it is the most favorable to thebackward induction states. On the contrary, the test-set rule τall works in favor of theemergence of cooperative behavior. A revising agent who tests all of his strategies hasmany opportunities to test a cooperative strategy against a cooperative opponent, and soto obtain a high payoff. By reducing the number of such chances, testing fewer strategiesmakes switches to cooperative strategies considerably less likely, especially starting frominitial states with little cooperative play. Here we show that under the less thoroughtest-set rule τtwo, under which revising agents only consider their current strategy andanother randomly-chosen strategy, the qualitative predictions described in the previoussection go largely unchanged.

Our analytical results for the BEP(τtwo, 1, βmin) dynamic are as follows

Proposition 5.1. Consider the BEP(τtwo, 1, βmin) dynamic in Centipede games.(i) For games of all lengths d, the rest point ξ† is a repellor.(ii) For games of lengths 3 ≤ d ≤ 8, the dynamic has exactly two rest points, ξ† and ξ∗ ∈ int(Ξ).

The rest point ξ∗ is asymptotically stable.(iii) For games of all lengths d, solutions from every initial condition ξ ∈ bd(Ξ) r {ξ†} imme-

diately enter int(Ξ).

The only differences between the conclusions stated in Proposition 5.1 and those fromthe previous section appears in part (ii): because testing only two strategies leads thepolynomials that define the dynamics to have fewer terms, we are able to analyticallycharacterize the rest points for games of lengths up to d = 8.

The proof that ξ† is a repellor is presented in Appendix B.2. The approach is thatdescribed in Section 4.1.1, but in this case the characterization of the eigenvectors ofDV(ξ†) is more difficult. The characterization of the rest points is again proved using thealgebraic methods from Section 3.

Analogues of the numerical results from Section 4 also go through. As shown in Figure3, the form of the stable rest point settles down once d = 8, with the interior rest point ξ∗

here concentrating its mass on the penultimate player’s four most cooperative strategiesand the last player’s five most cooperative strategies (see Table 3). Online AppendicesV and VI present precise values of the rest points for game lengths up to 20 and of theeigenvalues of the derivative matrices DV(ξ∗). In all cases, all eigenvalues have negativereal part, providing strong numerical evidence of the local stability of ξ∗. Finally, for

–24–

Page 25: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

3 4 5 6 7 8 9 10 ... 20

[0]

0.0

0.2

0.4

0.6

0.8

1.0

d

[1]

[2]

[3]

(i) penultimate mover

3 4 5 6 7 8 9 10 ... 20

[0]

0.0

0.2

0.4

0.6

0.8

1.0

d

[1]

[2]

[3]

(ii) last mover

Figure 3: The stable rest point of Centipede under the BEP(τtwo, 1, βmin) dynamic for game lengthsd = 3, . . . , 10 and d = 20. Stacked bars, from the bottom to the top, represent weights on strategies [0], [1],

[2], [3], etc. The dashed line separates exact (d ≤ 8) and numerical (d ≥ 9) results.

population p [5] [4] [3] [2] [1] [0]test-all .0000 .0000 .0000 .1135 .5018 .3847

test-two .0002 .0040 .0514 .2545 .3817 .3083

population q [5] [4] [3] [2] [1] [0]test-all .0000 .0000 .0015 .3357 .4197 .2432

test-two .0008 .0146 .1239 .3246 .3111 .2249

Table 3: The interior attractor of BEP(τ, 1, βmin) dynamics under test-set rules τall and τtwo (d ≥ 12).

games of lengths 3 through 20, we computed the time derivative of the squared distancefunction

(11) L(x, y) =

s1∑i=1

(xi − x∗i )2 +

s2∑j=1

(y j − y∗j )2.

at one billion randomly chosen points in Ξ, finding that it evaluates to a negative numberin all cases.27 This again provides strong numerical evidence that the interior rest pointξ∗ is an almost global attractor.

In the Online Appendix, we consider dynamics based on a third test-set rule, test-adjacent, under which an agent’s test set comprises his current strategy and a strategynumbered adjacently to his current strategy. We show that the qualitative predictionsabove persist under this test-set rule as well. Evidently, cooperative play will still emerge

27This is the only instance in which we use the Lyapunov function (11) instead of (10).

–25–

Page 26: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

even if it only may do so one step at a time.

5.2 Stick/min-if-tie and uniform-if-tie

The min-if-tie rule βmin is the tie-breaking rule that is least supportive of cooperativeplay. In this section we briefly describe the analogues of the forgoing results when othertie-breaking rules are used. Specifically, we consider the stick/min-if-tie rule βstick, whichwe feel is the most natural specification, and the uniform-if-tie rule βunif, which whencombined with the test-all rule τall defines the S(k) equilibrium of Osborne and Rubinstein(1998) and the corresponding dynamics of Sethi (2000).

While a variety of details of the analyses depend on the choice of tie-breaking rule, ourmain conclusions do not: in all cases, play under BEP dynamics in Centipede from almostall initial conditions converges to an interior rest point at which choices are concentratedon the most cooperative strategies, with weights that become essentially independent ofthe length of the game. Tables 4 and 5 report the weights on the most cooperative strategiesat the interior rest point for each combination of test-set and tie-breaking rules. Evidently,switching the tie-breaking rule from βmin to βstick or βunif leads to a stable rest point atwhich the fully cooperative strategy [0] gains weight at the expense of less cooperativestrategies.

Table 6 presents the maximal lengths of Centipede for which rest points can be charac-terized analytically, along with the degree of the leading polynomial of the Grobner basisand the maximum numbers of digits in coefficients of polynomials from the basis. Fulldescriptions of the rest points and the eigenvalues of the derivative matrices DV(ξ∗) forgame lengths d ∈ {2, . . . , 20} are provided in Online Appendices V and VI.28

Under uniform tie-breaking βunif, the backward induction state ξ† is not a rest point,as population 2 agents randomize among whatever strategies are in their test sets. Inother respects, the properties and analysis of these dynamics are similar to those underβmin. This conclusion complements Example 11 of Osborne and Rubinstein (1998), whichshows that as the number of periods d in Centipede grows large, the probability thatplayer 1 stops the game immediately in an S(1) equilibrium (equiv., a rest point of theBEP(τall, 1, βunif) dynamic) approaches zero.

State ξ† is a rest point under the βstick rule, and the behavior of the dynamics on aff(Ξ) in

28Example 4.1 shows that in Centipede of length d = 2, the BEP(τ, 1, βmin) dynamic converges to thebackward induction state ξ† from all initial conditions. Switching the tie-breaking rule to βstick or βunif

introduces new positive terms to the expression for y2 in (5), corresponding to cases in which population 2agents test both strategies against opponents who stop immediately. Because of these terms, the dynamicspossess an almost globally stable interior rest point. As discussed after Example 4.3, one could rule out thisbehavior by fiat by imposing the principle of admissibility.

–26–

Page 27: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

population p population q[3] [2] [1] [0] [3] [2] [1] [0]

min .0000 .1135 .5018 .3847 .0015 .3357 .4197 .2432stick/min .0000 .1106 .4999 .3895 .0014 .3315 .4163 .2508uniform .0000 .1080 .4980 .3939 .0013 .3277 .4131 .2580

Table 4: The interior attractor of BEP(τall, 1, β) under different tie-breaking rules β (d ≥ 6). p denotes theowner of the penultimate decision node, q the owner of the last decision node.

population p [5] [4] [3] [2] [1] [0]min .0002 .0040 .0514 .2545 .3817 .3083stick .0001 .0033 .0436 .2256 .3707 .3566

uniform .0001 .0034 .0439 .2261 .3690 .3575

population q [5] [4] [3] [2] [1] [0]min .0008 .0146 .1239 .3246 .3111 .2249stick .0007 .0123 .1066 .2969 .3149 .2685

uniform .0007 .0124 .1071 .2960 .3124 .2713

Table 5: The interior attractor of BEP(τtwo, 1, β) under different tie-breaking rules β (d ≥ 12).

degree ofleading poly.

# of digits in # of digits intest-all max d max coeff. of max coeff.

leading poly. in basis

min 6 221 94 13278stick 5 65 26 432

uniform 6 168 106 11214

degree ofleading poly.

# of digits in # of digits intest-two max d max coeff. of max coeff.

leading poly. in basis

min 8 97 619 54760stick 8 128 110 775

uniform 8 128 95 1108

Table 6: The maximal length d of Centipede for which the rest points of the BEP(τ, κ, β) dynamic have beencomputed analytically, the degree of the leading polynomial of the Grobner basis, and the maximum

numbers of digits in coefficients of polynomials from the basis.

–27–

Page 28: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

the vicinity of this state complicates the analysis. For each choice of test-set rule τ and formost game lengths, there are continuous sets of rest points of the BEP(τ, 1, βstick) dynamicin aff(Ξ). These sets consist of points satisfying x1 = 1, and the only state in Ξ they containis the backward induction state ξ†.29 The existence of these sets of rest points impliesthat the derivative matrices DV(ξ†) have eigenvalues equal to zero, so that the standardresults from linearization theory used earlier (see Section 4.1.1) cannot be applied. Toshow state ξ† is nevertheless repelling, we appeal to results from center manifold theory(Kelley (1967a,b), Perko (2001)) which describe the behavior of nonlinear dynamics nearnonhyperbolic rest points. See Appendix B.3 and Online Appendix IV for details.30

6. Larger numbers of trials

The analysis thus far has focused on cases in which agents test each strategy in theirtest sets exactly once. We now examine aggregate behavior when each strategy is subjectto larger numbers of trials κ, focusing on BEP(τall, κ, βmin) dynamics.

6.1 Instability and stability of the backward induction state

Proposition 4.2 shows that the backward induction state ξ† is a repellor under theBEP(τall, κ, βmin) dynamic with κ = 1. The following proposition shows that ξ† remainsunstable as long as the number of trials κ is less than the length of the game d, and thenbecomes stable for larger numbers of trials. The statement is complicated slightly by thedependence of the crossover point on whether d is even or odd.

Proposition 6.1. Under the BEP(τall, κ, βmin) dynamic in the Centipede game of length d, thebackward induction state ξ† is unstable if κ ≤ 2bd

2c and is asymptotically stable otherwise.

Like those of the earlier stability analyses, the proof of Proposition 6.1 (Appendix E)is based on linearization. The key observation is that linearizations of BEP dynamicsaround pure rest points are driven by match results in which exactly one out of all κsp

match partner plays a strategy different from the equilibrium strategy. We show that if

29Examining display (4), it is easy to verify that under the BEP(τtwo, 1, βstick) dynamic, all states in {(x, y) ∈aff(Ξ) : x = (1, 0 . . . , 0)′, y1 = 1} are rest points. Under τall, the sets of rest points in aff(Ξ) with x1 = 1 varywith the length of the game in more complicated ways.

30Another difficulty arises because the Grobner basis techniques from Section 3 require the set of polyno-mials under consideration to have a finite number of zeros. To circumvent this problem, we record the restpoint ξ† and then eliminate the extraneous rest points using a standard trick, representing the nonequalityx1 , 1 by introducing a new variable z and new equality (x1 − 1) z = 1 as inputs of the Grobner basisalgorithm.

–28–

Page 29: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

κ ≤ d − 1 (for d even), or κ ≤ d (for d odd), there is enough sensitivity to perturbations ofthe state to ensure the existence of an unstable manifold through ξ†; however, unlike in theκ = 1 case, ξ† need not be a repellor. Conversely, if these inequalities are violated, strategy1 ∈ S1 earns the highest total payoff after any matching with exactly one discrepantopponent. This insensitivity of population 1’s behavior to small changes in population2’s behavior ensures local stability.

In order to assess the practical relevance of the stability of the backward induction statefor larger numbers of trials, we use numerical analysis to estimate the basin of attractionof ξ†, and to determine the position of the interior saddle point of the dynamics, whichlies on the manifold separating the basin of ξ† from the basin of the main attractor. Wefocus for tractability on games of length d = 4 and numbers of trials κ ≤ 100. Details ofthese analyses are presented in Online Appendices VII and VIII.

We have two main findings. First, the basin of attraction is always minuscule, withvolumes always smaller than 0.01% of the total volume of the state space Ξ. Second, ξ†

is almost completely nonrobust to changes in behavior in population 1. Evidence for thislies in the position of the saddle points, which have more than 99.8% of population 1agents choosing strategy 1, indicating that changes in the behavior of 0.2% of population1 agents are enough to disrupt the stability of ξ∗. Thus the exact stability analysis of thebackward induction state for larger numbers of trials is undercut by a thorough numericalanalysis of the dynamics in the vicinity of that state.

6.2 Persistence of the stable interior rest point

When agents test their strategies thoroughly, the distributions of opponents’ choicesthey face when testing each strategy will come to resemble the current distribution of playin the opposing population. Since agents choose the strategy whose total payoff duringtesting was highest, this suggests that the rest points of the resulting dynamics shouldapproximate Nash equilibria. Indeed, when agents possess exact information, so that playadjusts according to the exact best response dynamic (Gilboa and Matsui (1991), Hofbauer(1995)), results of Xu (2016) imply that every solution trajectory converges to the set ofNash equilibria; in Centipede, all Nash equilibria entail all population 1 agents stoppingimmediately.

While the intuition suggested above is correct for large enough numbers of trials, itis nevertheless the case that stable cooperative behavior can persist when the number oftrials of each strategy is substantial. To illustrate this, we consider play in the Centipedegame of length d = 4 under the BEP(τall, κ, βmin) dynamic. Figures 4 and 5 present thestable rest points of this dynamic for numbers of trials κ up to 50, which we computed

–29–

Page 30: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

using numerical methods. While increasing the number of trials shifts mass towarduncooperative strategies, it is clear from the figures that this shifting takes place gradually:even with rather thorough testing, significant levels of cooperation are still maintained.We note as well that the fraction of population 2 agents who play the weakly dominatedstrategy 3 (always continue) becomes fixed between 7% and 6.5% once κ ≥ 15, even asthe fraction of population 1 agents who play strategy 3 remains far from 0 (specifically,between 28% and 18%).

While surprising at first glance, these facts can be explained by considering boththe expectations and the dispersions in the payoffs obtained through repeated trials ofeach strategy. As an illustration, consider the stable rest point when κ = 32, namelyξ∗ = (x∗, y∗) ≈ ((.2140, .5738, .2122), (.6333, .3010, .0657)). Let Π j be a random variablerepresenting the payoff obtained by a population 2 agent who plays strategy j in a singlerandom match at this state. By equation (2) (or Figure 1), the expected payoffs to thisagent’s three strategies are

E(Π1) = (0, 3, 3)·x∗= 2.3580, E(Π2) = (0, 2, 5)·x∗= 2.2086, E(Π3) = (0, 2, 4)·x∗= 1.9964.

From this we anticipate that the strategy weights in population 2 satisfy y∗1 > y∗2 > y∗3 .To explain why these weights take the values they do, we also need to know how

dispersed the payoffs from testing each strategy are. We thus compute the variances ofthe single-test payoffs Π j:

Var(Π1) = 1.5138, Var(Π2) = 2.7223, Var(Π3) = 1.7048.

Using these calculations and the central limit theorem, we find that the difference be-tween the average payoffs from 32 tests of strategy 3 and 32 tests of strategy 2 is approx-imately normally distributed with mean E(Π3) − E(Π2) = −.2122 and standard deviation√

(Var(Π3) + Var(Π2))/32 ≈ .3720. The latter statistic is commensurate with the former.Thus the weakly dominated strategy 3 yields a higher total payoff than the dominatingstrategy 2 with approximate probability P(Z ≥ .57) ≈ .28, and so is not a rare event.Likewise, evaluating the appropriate multivariate normal integrals shows that the proba-bilities of strategies 1, 2, and 3 yielding the highest total payoff are approximately .61, .32,and .07, figures which accord fairly well with the components of y∗.

As the number of trials κ becomes larger, greater averaging reduces the variation ineach strategy’s payoffs per trial. At the same time, increasing κ increases the weight x∗1on stopping immediately at the expense of population 1’s other two strategies, reducingthe differences in the expected payoffs of population 2’s strategies. This explains why

–30–

Page 31: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

1 5 10 15 20 25 30 35 40 45 500.0

0.2

0.4

0.6

0.8

1.0

d

[0]

[1]

[2]

(i) population 1

1 5 10 15 20 25 30 35 40 45 500.0

0.2

0.4

0.6

0.8

1.0

d

[0]

[1]

[2]

(ii) population 2

Figure 4: The stable interior rest point of the BEP(τall, κ, βmin) dynamic in the Centipede game of lengthd = 4, κ = 1, . . . , 50. Stacked bars, from the bottom to the top, represent weights on strategies [0], [1] and [2].

–31–

Page 32: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

x1

x2 x3

y1

y2 y3

Figure 5: The stable interior rest point in Centipede of length d = 4 under BEP(τall, κ, βmin) dynamics forκ = 1, . . . , 34 trials of each tested strategy. Lighter shading corresponds to larger numbers of trials. Dashed

lines represent boundaries of best response regions.

the strategy weights in population 2 do not vary very much as κ increases, and why theweight on the weakly dominated strategy hardly varies at all.

6.3 Convergence to cycles

Figure 4 does not record rest points for certain numbers of trials above 34. For thesevalues of κ, the population state does not converge to a rest point. Instead, our numericalanalyses indicate that for all κwith empty entries in Figure 4 and all κ between 51 and 100,the BEP(τall, κ, βmin) dynamic converges to a periodic orbit. Figure 6 presents the cyclesunder the BEP(τall, κ, βmin) dynamics for κ = 50, 100, and 200. In all three cases, we observesubstantial levels of cooperative play in population 1 over the course of the cycle, withthe fraction of the population choosing to continue at the initial node varying between .50and .83 for κ = 50, between .28 and .70 for κ = 100, and between .16 and .45 for κ = 200.These examples illustrate that cooperative behavior can persist even when agents havesubstantial amounts of information about opponents’ play.

From a methodological point of view, the existence of almost globally attracting limitcycles under BEP dynamics suggests that solution concepts like S(k) equilibrium andlogit equilibrium that are motivated as steady states of dynamic disequilibrium processesshould be applied with some caution. Existence results for such solution concepts cangenerally be proved by appeals to suitable fixed point theorems. But the fact that staticsolutions exist need not imply that any are stable, and it may happen that no static solution

–32–

Page 33: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

(i) κ = 50

(ii) κ = 100

(iii) κ = 200

Figure 6: Stable cycles in Centipede of length d = 4 under BEP(τall, κ, βmin) dynamics for κ = 50, 100, and200. Lighter shading represents faster motion. The small circles represent the unstable interior rest points.

For κ = 50 and 100, shapes synchronize positions along the cycle.

–33–

Page 34: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

provides a good prediction of the behavior of the underlying dynamic process.

7. Conclusion

In this paper, we have introduced a class of game dynamics built on natural assump-tions about the information agents obtain when revising, and have shown that thesedynamics lead to cooperative behavior in the Centipede game. One key feature of theagents’ revision process is that conditional on the current population state, the experi-enced payoffs to each strategy are independent of one another. This allows cooperativestrategies with suboptimal expected payoffs to be played with nonnegligible probabilities,even when the testing of each strategy involves substantial numbers of trials. The use ofany such strategy increases the expected payoffs of other cooperative strategies, creatinga virtuous circle that sustains cooperative play.

Appendix

A. Exact solutions of systems of polynomial equations

This appendix describes the algebraic tools we use to determine the exact rest pointsof BEP dynamics.

A.1 Grobner bases

Let Q[z1, . . . , zn], or Q[z] for short, denote the collection (more formally, the ring) ofpolynomials in the variables z1, . . . , zn with rational coefficients. Let F = { f1, . . . fm} ⊂ Q[z]be a set of such polynomials. Let Z be a subset ofRn, and consider the problem of findingthe set of points z∗ ∈ Z that are zeros of all polynomials in F.

To do so, it is convenient to first consider finding all zeros in Cn of the polynomials inF. In this case, the set of interest,

(12) V( f1, . . . , fm) = {z∗ ∈ Cn : f j(z∗) = 0 for all 1 ≤ j ≤ m}

is called the variety (or algebraic set) generated by f1, . . . , fm. To characterize (12), it is usefulto introduce the ideal generated by f1, . . . , fm:

(13) 〈 f1, . . . , fm〉 ={∑m

j=1h j f j : h j ∈ C[z] for all 1 ≤ j ≤ m

}.

–34–

Page 35: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

Thus the ideal (13) is the set of linear combinations of the polynomials f1, . . . , fm, where thecoefficients on each are themselves polynomials in C[z]. It is easy to verify that any othercollection of polynomials in C[z] whose linear combinations generate the ideal (13)—thatis, any other basis for the ideal—also generates the variety (12).

For our purposes, the most useful basis for the ideal (13) is the reduced lex-order Grobnerbasis, which we denote by G ⊂ Q[z]. This basis, which contains no superfluous polyno-mials and is uniquely determined by its ideal and the ordering of the variables, has thisconvenient property: it consists of polynomials in zn only, polynomials in zn and zn−1 only,polynomials in zn, zn−1, and zn−2 only, and so forth. Thus if the variety (12) has cardinality|V| < ∞, then it can be computed sequentially by solving univariate polynomials andsubstituting backward.31

In many cases, including all that arise in this paper, the basis G is of the simple form

(14) G = {gn(zn), zn−1 − gn−1(zn), . . . , z1 − g1(zn)}

for some univariate polynomials gn, . . . , g1, where gn has degree deg(gn) = |V| and wheredeg(gk) < |V| for k < n.32 In such cases, one computes the variety (12) by finding the |V|complex roots of gn, and then substituting each into the other n− 1 polynomials to obtainthe |V| elements of (12).33

A.2 Algebraic numbers

We now turn to the problem of finding the zeros of univariate polynomials. LetQ ⊂ C denote the set of algebraic numbers: the complex numbers that are roots of nonzeropolynomials with rational coefficients. Q is a subfield ofC, and this fact and the definitionof algebraic numbers are summarized by saying that Q is the algebraic closure of Q.34

31The notion of Grobner bases and the basic algorithm for computing them are due to Buchberger (1965).Cox et al. (2015) provide an excellent current account of Grobner basis algorithms, as well as a thoroughintroduction to the ideas summarized above.

32According to the shape lemma, a sufficient condition for the reduced lex-order basis to be of form (14) isthat each point in (12) have a distinct zn component and that (13) be a radical ideal, meaning that if it includessome integer power of a polynomial, then it includes the polynomial itself. See Becker et al. (1994) andKubler et al. (2014).

33Although we are only interested in elements of the variety (12) that lie in the state space Ξ, the solutionmethods described above only work if (12) has a finite number of solutions in Cn. For some specificationsof the BEP dynamic, the latter property fails, but we are able to circumvent this problem by introducingadditional variables and equations—see Section 5.2.

34Like C, Q is algebraically closed, in that every univariate polynomial with coefficients in Q has a root inQ. It follows from this and the existence of lex-order Grobner bases that when the variety (12) has a finitenumber of elements, the components of its elements are algebraic numbers.

–35–

Page 36: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

Every univariate polynomial g ∈ Q[x] can be factored as a product of irreducible poly-nomials inQ[x], which cannot themselves be further factored into products of nonconstantelements of Q[x].35 If an irreducible polynomial h ∈ Q[x] is of degree k, it has k distinctroots a1, . . . , ak ∈ Q. The multiple of h whose leading term has coefficient 1 is called theminimal polynomial of these roots. One often works instead with the multiple of h that isprimitive in Z[x], meaning that its coefficients are integers with greatest common divisor1.

Each algebraic number is uniquely identified by its minimal polynomial h and a labelthat distinguishes the roots of h from one another. For instance, one can label eachroot a j ∈ Q with a numerical approximation that is sufficiently accurate to distinguish a j

from the other roots. In computer algebra systems, the algebraic numbers with minimalpolynomial h are represented by pairs consisting of h and an integer in {1, . . . , k} whichranks the roots of h with respect to some ordering; for instance, the lowest integers arecommonly assigned to the real roots of h in increasing order.36

If the Grobner basis G is of form (14), then we need only look for the roots of theirreducible factors h of the polynomial gn, which are the possible values of xn ∈ Q; thensubstitution into the univariate polynomials gn−1, . . . , g1 determines the correspondingvalues of the other variables. The fact that these latter values are generated from a fixedalgebraic number allows us to work in subfields of Q in which arithmetic operationsare easy to perform. If the minimal polynomial h of α ∈ Q has degree deg(h), then forany polynomial f , one can find a polynomial f ∗ of degree deg( f ∗) < deg(h) such thatf (α) = f ∗(α). It follows that the values of gn−1(α), . . . , g1(α) are all elements of

Q(α) =

deg(h)−1∑

k=0

ak αk : a0, . . . , adeg(h)−1 ∈ Q

⊂ Q,called the field extension of Q generated by α. Straightforward arguments show that therepresentation of elements of Q(α) by sequences of coefficients (a0, . . . , ad) makes additionand multiplication in Q(α) simple to perform. For further details on algebraic numbersand field extensions, we refer the reader to Dummit and Foote (2004, Chapter 13) andCohen (1993, Chapter 4).

35“Typical” polynomials in Q[x] are irreducible: for instance, the quadratic ax2 + bx + c with a, b, c ∈ Q isonly reducible if

b2 − 4ac ∈ Q. By Gauss’s lemma, polynomial factorization in Q[x] is effectively equivalentto polynomial factorization in Z[x]. For an excellent presentation of polynomial factorization algorithms,see von zur Gathen and Gerhard (2013, Ch. 14–16).

36There are exact methods based on classic theorems of Sturm and Vincent for isolating the real roots ofa polynomial with rational coefficients; see McNamee (2007, Ch. 2 and 3) and Akritas (2010).

–36–

Page 37: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

A.3 Examples

To illustrate the techniques above, we use them to compute the rest points of theBEP(τtwo, 1, βmin) dynamic in the Centipede games with d = 3 and d = 4 decision nodes.Since x ∈ X and y ∈ Y, we need not explicitly write the laws of motion for the finalcomponents of x and y, as those components can be deduced from others and the simplexconstraints.37

Example A.1. The BEP(τtwo, 1, βmin) dynamic in Centipede of length d = 3 is

x1 = 12

(y1(x1 + x2) + y1(x1 + x3)

)− x1,

x2 = 12

(y2(x1 + x2) + (y2 + (y1)2)(x2 + x3)

)− x2,(15)

y1 = ((x2 + x3)(x1 + x2) + (x1)2)(y1 + y2) − y1

(see Online Appendix XI). To find the rest points of this system, we substitute x3 = 1−x1−x2

and y2 = 1 − y1 in the right-hand sides of (15) to obtain a system of three equations andthree unknowns. We then compute a Grobner basis of form (14) for the right-hand sidesof (15):

(16)

{3(y1)4

− 8(y1)3 + 13(y1)2− 12y1 + 4, 4x2 + 3(y1)3

− 5(y1)2 + 6y1 − 4,

8x1 − 3(y1)3 + 2(y1)2− 9y1 + 2

}.

The initial quartic in (16) has roots 1, 23 , and (1 ±

√7 i)/2. Of course, only the first two

roots could be components of states in Ξ. Substituting y1 = 1 in the remaining polynomialsin (16) and equating them to 0 yields x1 = 1 and x2 = 0, which with the simplex constraintsgives us the backward induction state ξ†. Substituting y1 = 2

3 instead yields the interiorstate ξ∗ = (x∗, y∗) = ((1

2 ,13 ,

16 ), ( 2

3 ,13 )). This is the complete set of rest points of the dynamic

(15). _

Example A.2. The BEP(τtwo, 1, βmin) dynamic in Centipede of length d = 4 is

(17)

x1 = 12

(y1(x1 + x2) + y1(x1 + x3)

)− x1,

x2 = 12

((y2 + y3)(x1 + x2) + (y2 + y1(y1 + y3))(x2 + x3)

)− x2,

y1 = 12 ((x2 + x3)(x1 + x2) + (x1)2)

((y1 + y2) + (y1 + y3)

)− y1,

y2 = 12 ((x2 + x3)(x1 + x3)(y1 + y2) + (x1 + x2(x2 + x3) + (x3)2)(y2 + y3)) − y2

37The Grobner basis algorithm sometimes runs faster if all components are retained and the left-handsides of the constraints

∑s1

i=1 xi − 1 = 0 and∑s2

j=1 y j − 1 = 0 are included in the initial set of polynomials. OurMathematica notebook includes both implementations.

–37–

Page 38: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

We can again compute a Grobner basis of form (14). Its univariate polynomial is

(18) 4096(y2)8−28608(y2)7+79812(y2)6

−64332(y2)5+39744(y2)4−9180(y2)3+648(y2)2

−243y2.

This polynomial has root 0, which again generates the backward induction state ξ†.Dividing (18) by y2 yields an irreducible 7th degree polynomial. Using the algorithmsmentioned above, one can show that this polynomial has one real root, which we designateby y∗2 = Root[4096α7

−28608α6+79812α5−64332α4+39744α3

−9180α2+648α−243, 1] ≈ .3607,and six complex roots. Substituting y∗2 into the remaining polynomials from the Grobnerbasis and using the simplex constraints, we obtain an exact expression for the interior restpoint ξ∗, whose components are elements of the field extension Q(y∗2); their approximatevalues are ξ∗ = (x∗, y∗) ≈ ((.2575, .4358, .3068), (.4095, .3607, .2298)). _

B. Repulsion from the backward induction state

Letting s = s1 + s2, we denote the tangent space of the state space Ξ = X × Y byTΞ = TX × TY = {(z1, z2)′ ∈ Rs :

∑i∈S1 z1

i = 0 and∑

j∈S2 z2j = 0}, and we denote the affine

hull of Ξ by aff(Ξ) = TΞ + ξ†. Writing our dynamics as

(D) ξ = V(ξ),

we have V : aff(Ξ) → TΞ, and so DV(ξ)z ∈ TΞ for all ξ ∈ Ξ and z ∈ TΞ. We can thusview DV(ξ) as a linear map from TΞ to itself, and the behavior of the dynamics in theneighborhood of a rest point is determined by the eigenvalues and eigenvectors of thislinear map. The latter are obtained by computing the eigenvalues and eigenvectors of theproduct matrixΦDV (ξ)Φ, where V : Rs

→ Rs is the natural extension of V to Rs, andΦis the orthogonal projection of Rs onto TΞ, i.e., the block diagonal matrix with diagonalblocks I − 1

s1 11′ ∈ Rs1×s1 and I − 1

s2 11′ ∈ Rs2×s2 , where 1 = (1, . . . , 1)′. Since V maps Ξ into

TΞ, the projection is only needed when there are eigenspaces of DV(ξ) that intersect boththe set TΞ and its complement.

In what follows we write δi∈ Rs and ε j

∈ Rs for the standard basis vectors corre-sponding to strategies i ∈ S1 and j ∈ S2, respectively. We also write all expressions interms of the numbers of decision nodes rather than the numbers of strategies, as doingso usually generates more compact expressions. To eliminate superscripts we use thenotations m ≡ d1 = s1

− 1 and n ≡ d2 = s2− 1 for the numbers of decision nodes.

For BEP dynamics with tie-breaking rule βmin, we prove that the backward inductionstate ξ† is a repellor using the following argument. Computing the eigenvalues and

–38–

Page 39: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

eigenvectors of DV(ξ†) as described above, we find that under each of the three test-setrules we consider, ξ† is a hyperbolic rest point, meaning that all of the eigenvalues havenonzero real part.

The linearization of the dynamic (D) at rest point ξ† is the linear differential equation

(L) z = DV(ξ†)z

on TΞ. The stable subspace Es⊆ TΞ of (L) is the span of the real and imaginary parts of

the eigenvectors and generalized eigenvectors of DV(ξ†) corresponding to eigenvalueswith negative real part. The unstable subspace Eu

⊆ TΞ of (L) is defined analogously. Thebasic theory of linear differential equations implies that solutions to (L) on Es converge tothe origin at an exponential rate, that solutions to (L) on Eu diverge from the origin at anexponential rate, and that the remaining solutions approach Eu and then diverge from theorigin at an exponential rate.

Let As = Es + ξ† and Au = Eu + ξ† denote the affine spaces that are parallel to Es and Eu

and that pass through ξ†. In Sections B.1 andB.2, we prove that under the BEP(τall, 1, βmin)and BEP(τtwo, 1, βmin) dynamics, the dimensions of Es and Eu are d − 1 and 1, and that As

is a supporting hyperplane to Ξ at ξ†.Combining these facts with fundamental results from dynamical systems theory lets

us complete the proof that ξ† is a repellor under each dynamic. By the Hartman-Grobmantheorem (Perko (2001, Section 2.8)), there is a homeomorphism h between a neighborhoodof ξ† in aff(Ξ) and a neighborhood of 0 in TΞ that maps solutions of (D) to solutions of (L).By the stable manifold theorem (Perko (2001, Section 2.7)), there is an invariant stable manifoldMs⊂ aff(Ξ) of dimension dim(Es) = d − 1 that is tangent to As at ξ† such that solutions to

(D) in Ms converge to ξ† at an exponential rate. Combining these results shows that thereis a neighborhood O ⊂ aff(Ξ) of ξ† with these properties: O ∩ Ξ ∩Ms = {ξ†}; the initialconditions in O from which solutions converge exponentially quickly to ξ† are those inO∩Ms; and solutions from initial conditions in (O∩Ξ)r {ξ†} eventually move away fromξ†. Thus the properties stated in the previous paragraph imply that state ξ† is a repellorof the dynamic (D) on Ξ.

For dynamics with tie-breaking rule βstick, the rest point ξ† is not hyperbolic, so resultsfrom center manifold theory are needed for the local stability analysis. These are explainedin Sections B.3 and B.4.

B.1 Test-all, min-if-tie

Starting from formula (3), it is easy to verify that under the BEP(τall, 1, βmin) dynamic,

–39–

Page 40: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

DV (ξ†) =

−1 0 · · · 0 m + 1 1 · · · 1

0 −1 . . .... 0 1 · · · 1

.... . . . . . 0

......

. . ....

0 · · · 0 −1 0 1 · · · 1n + 1 1 · · · 1 −1 0 · · · 0

0 1 · · · 1 0 −1 . . ....

......

. . ....

.... . . . . . 0

0 1 · · · 1 0 · · · 0 −1

For d ≥ 3, the eigenvalues of DV(ξ†) with respect to TΞ and the bases for their eigenspacesare as follows:

−1,{δ2− δi : i ∈ {3, . . . ,m + 1}

}∪

{ε2− ε j : j ∈ {3, . . . ,n + 1}

};(19)

−1 −√

mn,{(√

mn,−√

n/m, . . . ,−√

n/m | −n, 1, . . . , 1)′}

; and(20)

−1 +√

mn,{(−√

mn,√

n/m, . . . ,√

n/m | −n, 1, . . . , 1)′}.(21)

The eigenvectors in (19) and (20) span the stable subspace Es of the linear equation (L).The normal vector to Es is

z⊥ =

(−

n + 1m + 1

√mn,

n + 1(m + 1)

√mn

, . . . ,n + 1

(m + 1)√

mn

∣∣∣∣ −1,1n, . . . ,

1n

)′.

This vector satisfies

(z⊥)′(δi− δ1) =

n + 1√

mn> 0 for i ∈ S1 r {1} and(22)

(z⊥)′(ε j− ε1) =

n + 1n

> 0 for j ∈ S2 r {1}.(23)

The collection of vectors {δi− δ1 : i ∈ S1

} ∪ {ε j− ε1 : j ∈ S2

} describes the motions along alledges of the convex set Ξ emanating from state ξ†. Thus the fact that their inner productswith z⊥ are all positive implies that the translation of Es to ξ† is a hyperplane that supportsΞ at ξ†. Since the remaining eigenvalue, from (21), is positive, the arguments from thestart of the section allow us to conclude that ξ† is a repellor.

Remark B.1. There is no loss in replacing the normal vector z⊥ ∈ TΞ with an auxiliaryvector z⊥aux ∈ R

s whose orthogonal projection onto TΞ is a multiple of z⊥. For instance,subtracting n+1

(m+1)√

mn (1, . . . , 1 | 0, . . . , 0)′ + 1n (0, . . . , 0 | 1, . . . , 1)′ from z⊥ and multiplying the

–40–

Page 41: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

result by nn+1 yields the simple auxiliary vector z⊥aux = −

√nm δ

1− ε1. Replacing z⊥ by z⊥aux

in (22) and (23) yields the inner products√

nm > 0 and 1 > 0, respectively. To have less

cumbersome expressions we will work with auxiliary vectors in what follows.

B.2 Test-two, min-if-tie

Starting from the formula from Online Appendix ??, we find that under the BEP(τtwo, 1, βmin)dynamic,

DV (ξ†) =

0 1m · · · · · ·

1m 2 1 · · · 1

0 −1m

. . . . . .... 0 1

m · · ·1m

0 0 −2m

. . ....

......

. . ....

......

. . . . . . 1m

......

. . ....

0 0 · · · 0 −mm 0 1

m · · ·1m

2 1 · · · · · · 1 0 1n · · ·

1n

0 1n · · · · · ·

1n 0 −

1n

. . ....

......

. . . . . ....

.... . . . . . 1

n

0 1n · · · · · ·

1n 0 · · · 0 −

nn

For d ≥ 3, the eigenvalues and eigenvectors of DV(ξ†) with respect to TΞ are

−i

m , δi− δi+1 (i ∈ {2, . . . ,m});(24)

−jn , ε j

− ε j+1 ( j ∈ {2, . . . ,n}, if d ≥ 4);(25)

λ− ≡−(m+n)−

√m−n+(2mn)2

2mn , (a−1 , a−

2 , . . . , a−

m+1 | b−

1 , b−

2 , . . . , b−

n+1)′; and(26)

λ+ ≡−(m+n)+

√m−n+(2mn)2

2mn , (a+1 , a

+2 , . . . , a

+m+1 | b

+1 , b

+2 , . . . , b

+n+1)′,(27)

where letting ? represent either + or −, we have

b?n+1 = 1; b?j =nλ?+ j+1nλ?+ j−1 b?j+1 ( j = n, . . . , 2); b?1 = −n(λ? + 1)(λ? + 1

m );(28)

a?m+1 = nm (λ? + 1

m ); a?i = mλ?+i+1mλ?+i−1 a?i+1 (i = m, . . . , 2); a?1 = −n(λ? + 1).(29)

When d is even, then the eigenvalues from (24) and (25) are the same, λ− = −m+1m , and

λ+ = m−1m . When d is odd, all eigenvalues are distinct, and λ− and λ+ are irrational.

(The components (28) and (29) of the eigenvectors from (26) and (27) are obtained asfollows: By construction DV(ξ†) maps TΞ to itself, and by definition

–41–

Page 42: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

(30)(DV(ξ†) − λ?I

) a?

b?

= 0.

Using the facts that a?1 = −∑m+1

i=2 a?i and b?1 = −∑n+1

j=2 b?j , we can obtain simple expressionsfor the first components of each block of (30), the last components of each block, andthe differences between the (i + 1)st and ith components of the upper block (i ∈ {2, . . .m})and the ( j + 1)st and jth components of the lower block ( j ∈ {2, . . .n}). Each expresses acomponent of b? as a multiple of a component of a?. Setting b?n+1 = 1 and applying theseexpressions yields (28) and (29), along with the fact that (λ? + 1

m )(λ? + 1n ) = 1.)

The eigenvectors in (24)–(26) span the stable subspace Es of the linear equation (L).The normal vector to Es is the orthogonal projection onto TΞ of the auxiliary vector

z⊥aux = −m−n+√

m−n+(2mn)2

2mn δ1− ε1,

which satisfies

(z⊥)′(δi− δ1) = (z⊥aux)′(δi

− δ1) =m−n+√

m−n+(2mn)2

2mn > 0 for i ∈ S1 r {1}, and

(z⊥)′(ε j− ε1) = (z⊥aux)′(ε j

− ε1) = 1 > 0 for j ∈ S2 r {1}.

Thus the translation of Es to ξ† supports Ξ at ξ†. Since the remaining eigenvalue, from(27), is positive, the arguments from the start of the section imply that ξ† is a repellor.

B.3 Test-all, stick/min-if-tie

Under BEP dynamics based on the stick/min-if tie rule βstick, the backward inductionstate ξ† is not a hyperbolic rest point, and in particular has some zero eigenvalues. Toshow that it is nevertheless a repellor, we follow the same line of argument as above,but replace the arguments based on the Hartman-Grobman and stable manifold theoremswith arguments from center manifold theory.

Consider again the dynamic (D) and its linearization at ξ† (L) from the start of thissection. When ξ† is not a hyperbolic rest point of (D), we can define for (L) the stablesubspace Es

⊆ TΞ, the unstable subspace Eu⊆ TΞ, and the center subspace Ec

⊆ TΞ,where the last is the span of the real and imaginary parts of eigenvectors correspondingto eigenvalues with zero real part.

Let Acs = Ec⊕ Es + ξ† be the affine space that is parallel to Ec

⊕ Es and that passesthrough ξ†. Below we show that under the BEP(τall, 1, βstick) dynamic, the subspace Ec

⊕Es

has dimension d − 1, and the affine space Acs is a supporting hyperplane to Ξ at ξ†.

–42–

Page 43: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

Linearization is much less simple for nonhyperbolic rest points than for hyperbolicones—see Perko (2001). However, for our purposes, it is enough that there exists a (local)center-stable manifold Mcs that is tangent to Acs, and is invariant under (D) (Kelley (1967b)).This manifold need not be unique; see Kelley (1967b, Section 4) for an example. But for anychoice of center-stable manifold Mcs, there is a neighborhood O ⊂ aff(Ξ) of ξ† satisfyingO ∩ Ξ ∩ Mcs = {ξ†} such that solutions to (D) from initial conditions in (O ∩ Ξ) r {ξ†}eventually move away from ξ†; see Kelley (1967a, p. 336), or see Perko (2001, Theorem2.12.2) for a closely related and more explicitly presented result. This and the propertiesfrom the previous paragraph imply that ξ† is a repellor of the BEP(τ, 1, βstick) dynamics onΞ.

Turning to the computation, we use the formula from Online Appendix ?? to showthat under the BEP(τall, 1, βstick) dynamic,

DV (ξ†) =

−1 0 · · · 0 m + 1 1 · · · 1

0 −1 . . .... 0 1 · · · 1

.... . . . . . 0

......

. . ....

0 · · · 0 −1 0 1 · · · 1n + 1 1 · · · 1 0 · · · · · · 0

0 1 · · · 1...

. . ....

......

. . ....

.... . .

...

0 1 · · · 1 0 · · · · · · 0

The eigenvalues of DV(ξ†) and bases for their eigenspaces are

0,{ε2− ε j : j ∈ {3, . . . ,n + 1}

};(31)

−1,{δ2− δi : i ∈ {3, . . . ,m + 1}

};(32)

λ− ≡ −12 −

√mn + 1

4 ,{

(−λ−,λ−m, . . . ,

λ−m

∣∣∣∣ −n, 1, . . . , 1)′}

; and(33)

λ+ ≡ −12 +

√mn + 1

4 ,{

(−λ+,λ+

m, . . . ,

λ+

m

∣∣∣∣ −n, 1, . . . , 1)′}.(34)

The eigenvectors in (31) span the center subspace Ec of the linear equation (L), whilethe eigenvectors in (32) and (33) span the stable subspace Es. The normal vector to thehyperplane Ec

⊕ Es is the orthogonal projection onto TΞ of the auxiliary vector

z⊥aux = −

√nmδ1− ε1,

–43–

Page 44: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

which satisfies

(z⊥)′(δi− δ1) = (z⊥aux)′(δi

− δ1) =√

nm > 0 for i ∈ S1 r {1} and

(z⊥)′(ε j− ε1) = (z⊥aux)′(εi

− ε1) = 1 > 0 for j ∈ S2 r {1}.

Since the remaining eigenvalue, from (34), is positive, the arguments above imply that ξ†

is a repellor.

B.4 Test-two, stick-if-tie

Starting from formula (4), we compute that under the BEP(τtwo, 1, βstick) dynamic,

DV (ξ†) =

0 1m · · · · · ·

1m 2 1 · · · 1

0 −1m 0 · · · 0 0 1

m · · ·1m

0 0 −1m · · ·

......

... · · ·...

...... · · · · · · 0

......

...

0 0 · · · 0 −1m 0 1

m · · ·1m

2 1 · · · · · · 1 0 · · · · · · 0

0 1n · · · · · ·

1n

... · · ·...

...... · · ·

...... · · ·

...

0 1n · · · · · ·

1n 0 · · · · · · 0

For d ≥ 3, the eigenvalues of DV(ξ†) with respect to TΞ and the bases for their eigenspacesare:

0,{ε2− ε j : j ∈ {3, . . . , s2

}

}if d ≥ 4;(35)

−1m ,

{δ2− δi : i ∈ {3, . . . , s1

}

};(36)

λ− ≡ 12m −

√1 + ( 1

2m )2,{

(−λ−, λ−m , . . . ,λ−m

∣∣∣∣ −1, 1n , . . . ,

1n )′

}; and(37)

λ+ ≡1

2m +

√1 + ( 1

2m )2,{

(−λ+,λ+

m , . . . ,λ+

m

∣∣∣∣ −1, 1n , . . . ,

1n )′

}.(38)

The eigenvectors in (35) span the center subspace Ec of the linear equation z = DV(ξ†) z,while the eigenvectors in (36) and (37) span the stable subspace Es. The normal vector tothe hyperplane Ec

⊕ Es is the orthogonal projection onto TΞ of the auxiliary vector

z⊥aux = 1λ−δ1− ε1

–44–

Page 45: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

which satisfies

(z⊥)′(δi− δ1) = − 1

λ−> 0 for i ∈ S1 r {1}, and

(z⊥)′(ε j− ε1) = 1 > 0 for j ∈ S2 r {1}.

Since the remaining eigenvalue, from (38), is positive, the arguments used in the appendixfor BEP(τall, 1, βstick) imply that ξ† is a repellor.

Under the BEP(τtwo, 1, βstick) dynamic, the affine set through ξ† defined by the eigen-vectors with zero eigenvalue consists entirely of rest points; see footnote 29. This fact andCorollary 3.3 of Sijbrand (1985) imply that this affine set is the unique center manifoldthrough ξ†.

C. Local stability analysis of interior rest points

The interior rest pointξ∗ of the dynamic x = V(x) is locally stable if all eigenvalues of thederivative matrix DV(ξ∗) have negative real part. Since each entry of the derivative matrixDV (ξ∗) is a polynomial with many terms that is evaluated at state whose components arealgebraic numbers, it is not feasible to compute its eigenvalues exactly. We circumventthis problem by computing the eigenvalues of the derivative matrix at a nearby rationalstate ξ, and making use of a bound on the distances between the eigenvalues of the twomatrices. This bound is established in Proposition C.1, and a more quickly computablealternative is provided in Proposition III.4 of the online appendix.

As in Appendix B, let s = s1 + s2 = d + 2, let ξ = V(ξ), V : aff(Ξ) → TΞ denote aninstance of the BEP dynamics, and let V : Rs

→ Rs denote the natural extension of V toRs. Observe that if DV (ξ) is diagonalizable, then so is DV(ξ), and all eigenvalues of thelatter are eigenvalues of the former. To state the proposition, we write S = S1

∪ S2 andomit population superscripts to define

(39) ∆ = maxi∈S

maxk∈S

∑j∈S

∂2Vi

∂ξ j∂ξk(1, . . . , 1 | 1, . . . , 1).

Proposition C.1. Suppose that DV (ξ) is (complex) diagonalizable with DV (ξ) = Q diag(λ)Q−1,and let λ∗ be an eigenvalue of DV (ξ∗). Then there is an eigenvalue λi of DV (ξ) such that

(40)∣∣∣λ∗ − λi

∣∣∣ < 2∆

ss/2−1

tr(Q∗Q)s/2

|det(Q) |

∑k∈S

|ξk − ξ∗k |.

–45–

Page 46: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

The eigenvalue perturbation theorem (42) that begins the proof of the propositionbounds the distances between the eigenvalues of DV (ξ∗) and DV (ξ), but neither term onits right-hand side is feasible to compute. The second paragraph of the proof provides abound on the condition number κ∞(Q) that does not require the computation of the inverseof the (algebraic-valued) eigenvector matrix Q. The third paragraph provides a boundon the norm of DV (ξ) −DV (ξ∗), which is needed because numerically evaluating of theentries of DV (ξ∗) with guaranteed precision is computationally infeasible. Two furtherdevices that we employ to improve the bound and speed its computation are describedafter the proof of the proposition.

Proof. For M ∈ Rs×s, let

(41) |||M|||∞ = max1≤i≤s

s∑j=1

|Mi j|.

denote the maximum row sum norm of M. Let κ∞(Q) = |||Q|||∞ |||Q−1|||∞ be the condition

number of Q with respect to norm (41). The following eigenvalue perturbation theorem(Horn and Johnson (2013, Observation 6.3.1)) follows from the Gersgorin disk theoremand the submultiplicativity of matrix norms:

(42)∣∣∣λ∗ − λi

∣∣∣ ≤ κ∞(Q) |||DV (ξ) −DV (ξ∗)|||∞.

To bound κ∞(Q), let |||M|||2 denote the spectral norm of M (i.e., the largest singularvalue of M), and let κ2(Q) = |||Q|||2 |||Q−1

|||2 be the condition number of Q with respect tothis norm. Since the maximum row sum and spectral norms differ by a factor of at most√

s (Horn and Johnson (2013, Problem 5.6.P23)), it follows that

(43) κ∞(Q) ≤ sκ2(Q).

Also, Guggenheimer et al. (1995) (see also Merikoski et al. (1997)) show that

(44) κ2(Q) <2

|det(Q) |

(tr(Q∗Q)

s

)s/2

.

To bound the final expression in (42), note that by construction, each component of theBEP dynamics x = V (x) is the difference between a sum of monomials in the componentsof ξ with positive coefficients and a linear term (see Online Appendix IX). Thus thesecond derivatives of Vi(ξ) are sums of monomials with positive coefficients. Since everycomponent of every state ξ ∈ Ξ is at most 1, we therefore have

–46–

Page 47: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

(45) maxξ∈Ξ

∣∣∣∣∣∣ ∂2Vi

∂ξ j∂ξk(ξ)

∣∣∣∣∣∣ ≤ ∂2Vi

∂ξ j∂ξk(1, . . . , 1 | 1, . . . , 1).

Thus the fundamental theorem of calculus, (45), and (39) imply that

|||DV (ξ) −DV (ξ∗)|||∞ ≤ maxi∈S

∑j∈S

∑k∈S

∂2Vi

∂ξ j∂ξk(1, . . . , 1 | 1, . . . , 1) × |ξk − ξ

∗k |

≤ ∆∑k∈S

|ξk − ξ∗k |.(46)

Combining inequalities (42), (43), (44), and (46) yields inequality (40). �

When applying Proposition C.1, one can choose Q to be any matrix of eigenvectorsof DV (ξ). Guggenheimer et al. (1995) suggest that choosing the eigenvectors to haveEuclidean norm 1 (which if done exactly makes the expression in parentheses in (44)equal 1) leads to the lowest bounds. We apply this normalization in the final step ofour analysis. Finally, recalling that s = d + 2, we can speed up calculations substantiallyby replacing the s × s matrix DV (ξ∗) with a d × d matrix with the same eigenvalues asDV(ξ) : TΞ → TΞ. We accomplish this by adapting a dimension-reduction method fromSandholm (2007). See Proposition III.4 in the online appendix for the bound’s final form.

To use this bound to establish the stability of the interior rest point ξ∗, we choose arational point ξ close to ξ∗, compute the eigenvalues of the derivative matrix DV(ξ), andevaluate the bound from Proposition III.4. The eigenvalues of DV(ξ) all have negativereal part so long as ξ is reasonably close to ξ∗.38

If ξ is close enough to ξ∗ that the bound is smaller than the magnitude of the realpart of any eigenvalue of DV(ξ), we can conclude that the eigenvalues of DV(ξ∗) all havenegative real part, and hence that ξ∗ is asymptotically stable.

Selecting state ξ involves a tradeoff: choosing ξ closer to ξ∗ reduces the bound, butdoing so also leads the components of ξ to have larger numerators and denominators,which slows the computation of the bound significantly. In all cases, we are able to chooseξ satisfactorily and to conclude that ξ∗ is asymptotically stable. For further details abouthow the computations are implemented, see the online appendix.

38Among the cases in which we solve for ξ∗ exactly, the smallest magnitude of a real part of an eigenvaluefor τall and τtwo occurs under the BEP(τtwo, 1, βstick) dynamic when d = 8, in which case there is an eigenvalueapproximately equal to −.2781. See Online Appendix VI.

–47–

Page 48: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

D. Inward motion from the boundary

The following differential inequality will allow us to obtain simple lower bounds onthe use of initially unused strategies. In all cases in which we apply the lemma, v(0) = 0.

Lemma D.1. Let v : [0,T]→ R+ satisfy v(t) ≥ a(t) − v(t) for some a : [0,T]→ R+. Then

(47) v(t) ≥ e−t

(v(0) +

∫ t

0es a(s) ds

)for all t ∈ [0,T].

Proof. Clearly v(t) = v(0) +∫ t

0v(s) ds ≥ v(0) +

∫ t

0(a(s) − v(s)) ds. The final expression is

the time t value of the solution to the differential equation v(s) + v(s) = a(s) with initialcondition v(0). Using the integrating factor es to solve this equation yields the right-handside of (47). �

For the analysis to come, it will be convenient to work with the set S = S1∪ S2 of

all strategies from both populations, and to drop population superscripts from notationrelated to the state—for instance, writing ξi rather than ξp

i .We use Lemma D.1 to prove inward motion from the boundary under BEP dynamics

in the following way. Write ξi = ri(ξ) − ξi, where ri(ξ) is the polynomial appearing in theformula (1) for the BEP(τ, 1, β) dynamic. Let {ξ(t)}t≥0 be the solution to the dynamic withinitial condition ξ(0). Let S0 = supp(ξ(0)) and Q = min{ξh(0) : h ∈ S0}, and, finally, letS1 = {i ∈ S r S0 : ri(ξ(0)) > 0} and R = 1

2 min{rk(ξ(0)) : rk(ξ(0)) > 0}.By the continuity of (1), there is a neighborhood O ⊂ Ξ of ξ(0) such that every χ ∈ O

satisfies χh > Q for all h ∈ S0 and χi ≥ R for all i ∈ S1 . And since (1) is smooth, there is atime T > 0 such that ξ(t) ∈ O for all t ∈ [0,T]. Thus applying Lemma D.1 shows that

(48) ξi(t) ≥ R(1 − e−t) for all t ∈ [0,T] and i ∈ S1.

Now let S2 be the set of j < S0 ∪ S1 for which there is a term of polynomial r j whosefactors all correspond to elements of S0 or S1. If this term has a factors in S0, b factors inS1, and coefficient c, then the foregoing claims and Lemma D.1 imply that

(49) ξ j(t) ≥ c Qa e−t∫ t

0es(R(1 − e−s))b ds for all t ∈ [0,T].

Proceeding sequentially, we can obtain positive lower bounds on the use of any strategyfor times t ∈ (0,T] by considering as-yet-unconsidered strategies k whose polynomials rk

have a term whose factors all correspond to strategies for which lower bounds have

–48–

Page 49: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

already been obtained. Below, we prove that solutions to the BEP(τall, 1, βmin) dynamicfrom states ξ(0) , ξ† immediately enter int(Ξ) by showing that the strategies in S r S0 canbe considered in a sequence that satisfies the property just stated. The proofs for otherBEP dynamics are described in the remarks that follow.

To proceed, we use the notations i[1] and i[2] to denote the ith strategies of players 1 and2. We also introduce the linear order ≺ on S defined by 1[1]

≺ 1[2]≺ 2[1]

≺ 2[2]≺ 3[1]

≺ . . . ,which arranges the strategies according to how early they stop play in Centipede.

Proof of Proposition 4.5. Fix an initial condition ξ(0) , ξ†. We can sequentially add allstrategies in S r S0 in accordance with the property above as follows:

(I) First, we add the strategies {i ∈ S rS0 : i ≺ max S0} in decreasing order. At the pointthat i has been added, i’s successor h has already been added, and strategy i is the uniquebest response when the revising agent tests all strategies against opponents playing h. LetS I denote the set of strategies added during this stage. The assumption that ξ(0) , ξ†

implies that S0 ∪ S I contains 1[1], 1[2], and 2[1].(II) Second, we add the strategies j ∈ S2r(S0∪S I). We can do so because j is the unique

best response when it is tested against 2[1] and all other strategies are tested against 1[1].(III) Third, we add the strategies k ∈ S1r (S0∪S I). We can do so because k is the unique

best response when it is tested against 2[2] and other strategies are tested against 1[2]. �

Since the preceding argument only used unique best responses, it applies equally wellunder any tie-breaking rule. In the case of the uniform-if-tie rule βunif, a simple argumentalong the lines above also establishes motion into the interior from the backward inductionstate ξ†. Likewise, while Proposition 4.5 concerns the test-all rule τall, the analysis aboveworks equally well for the test-two rule τtwo.

E. Stability analysis of the backward induction state when κ ≥ 2

Writing K = {1, . . . , κ}, we can write the population 1 equations of the BEP(τall, κ, βmin)dynamic as

(50) xi =∑

r : S1×K→S2

∏`∈S1, λ∈K

yr`λ

1[i = min

(argmax

k∈S1

π1k(r)

)]− xi, where π1

k(r) =

κ∑m=1

Akrkm .

The result function (`, λ) 7→ r`λ specifies the strategy in S2 played by an agent’s matchpartner during the λth test of strategy ` for all ` ∈ S1 and λ ∈ K. The second piece of (50)specifies the probability of a given result, and the third piece indicates whether strategy i

–49–

Page 50: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

is the minimal optimal strategy for this result.If there are two or more occurrences of strategies from S2 other than 1, then all partial

derivatives of the product in (50) equal 0. Thus for the purposes of computing the Jacobian,we need only consider results in which there are 0 or 1 match partners playing strategiesother than strategy 1 ∈ S2. These results comprise the following possibilities:

(i) If all match partners play strategy 1 ∈ S2, then strategy 1 ∈ S1 earns total payoff

κ · 0 and all other strategies earn total payoff κ · (−1), so strategy 1 has the bestexperienced payoff.

(ii) If the lone match against another strategy j ∈ S2 r {1} occurs when the revisingagent plays strategy 1 ∈ S1, then total payoffs are as above, and strategy 1 has thebest experienced payoff.

(iii) If the lone match against another strategy occurs when the revising agent playsstrategy i ∈ S1 r {1}, and if this match occurs against an opponent playing strategyj ∈ S2 r {1}, then (using the payoffs Ai j defined in (2)) strategy 1 is the minimalstrategy earning the best experienced payoff if

κ · 0 ≥ (κ − 1) · (−1) +

2i − 2 if i ≤ j,

2 j − 3 if i > j;

otherwise, strategy i uniquely obtains the best experienced payoff.Accounting for all of these possibilities, including the fact that the matches in cases (ii)and (iii) can occur during any of the κ tests of the strategy in question, we have

(51a)

x1 = (y1)κs1+ κ(y1)κs1

−1

s2∑j=2

y j +

s1∑i=2

i−1∑j=2

y j12 j−2≤κ +

s2∑j=i

y j12i−1≤κ

− x1 + O((y−1)2),

xi = κ(y1)κs1−1

i−1∑j=2

y j12 j−3≥κ +

s2∑j=i

y j12i−2≥κ

− xi + O((y−1)2),

where y−1 =∑s2

j=2 y j and i ∈ S1 r {1}.Turning to population 2, the test results with 0 or 1 match against opponents playing

strategies other than 1 ∈ S1 comprise these possibilities:(i) If all match partners play strategy 1 ∈ S1, then all strategies earn total payoff κ · 0,

so strategy 1 is the minimal strategy earning the best experienced payoff.(ii) If the lone match against another strategy occurs when the revising agent plays

strategy j ∈ S2, then strategy j earns a positive total payoff and other strategies earntotal payoff 0, so strategy j has the best experienced payoff.

–50–

Page 51: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

Accounting for both possibilities, we obtain

(51b)

y1 = (x1)κs2+ κ(x1)κs2

−1s1∑

i=2

xi − y1 + O((x−1)2),

y j = κ(x1)κs2−1

s1∑i=2

xi − y j + O((x−1)2),

where x−1 =∑s1

i=2 xi and j ∈ S1 r {1}.Taking the derivative of (51) at state ξ†, we obtain the following matrix. (We write

this matrix for the case of d even, so that s1 = s2; roughly speaking, the case of d oddcorresponds to removing the final column.)

(52) DV (ξ†) =

−1 0 · · · · · · 0 κs1 + · · · · · · +

0 −1 . . . . . .... 0 κ12i−2≥κ · · · · · · κ12i−2≥κ

.... . . . . . . . .

...... κ12 j−3≥κ

. . . . . ....

.... . . . . . . . . 0

......

. . . . . ....

0 · · · · · · 0 −1 0 κ12 j−3≥κ · · · κ12 j−3≥κ κ12i−2≥κ

κs2 κ · · · · · · κ −1 0 · · · · · · 0

0 κ · · · · · · κ 0 −1 . . ....

......

.... . . . . .

......

. . . . . .... 0

......

. . . . . ....

.... . . . . . . . . 0

0 κ · · · · · · κ 0 · · · 0 0 −1

.

Each + above represents the number that makes the column sum in the block equal κs1.If all indicator functions in the upper-right block of DV (ξ†) equal 0, then each + in

(52) equals κs1, implying that this block is the zero operator on TY. In this case (52) actsas a block triangular matrix on TΞ, and so its lone eigenvalue with respect to TΞ is −1,implying that ξ† is stable.

To check that the indicators are all 0 when d is even, it is enough to consider theindicator for entry j = i = s2

≡12d + 1, which is 0 if and only if κ ≥ d + 1. When d is odd, it

is enough to check the indicator for entry j = i = s1− 1 ≡ 1

2 (d + 1), which is 0 if and only ifκ ≥ d. We conclude that ξ† is asymptotically stable in these cases.

To show that ξ† is unstable in the remaining cases (when κ ≥ 2 and d ≥ 3), write

–51–

Page 52: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

χκi j =

12i−2≥κ if i ≤ j,

12 j−3≥κ, if i > j,χκiΣ =

s2∑j=2

χκi j, and χκΣΣ =

s1∑i=2

s2∑j=2

χκi j

for i, j ≥ 2. A straightforward calculation shows that λ = κ√χκ

ΣΣ− 1 is an eigenvalue of

DV (ξ†) corresponding to eigenvector

z = (−χκΣΣ, χκ2Σ, . . . , χ

κs1Σ| −(s2

− 1)√χκ

ΣΣ,√χκ

ΣΣ, . . . ,

√χκ

ΣΣ).

Since κ ≥ 2, λ is positive whenever at least one of the indicators in DV (ξ†) equals 1.Combining this with the previous argument, we conclude that ξ† is unstable whenever itis not asymptotically stable.

References

Akritas, A. G. (2010). Vincent’s theorem of 1836: Overview and future research. Journal ofMathematical Sciences, 168:309–325.

Basu, K. (1994). The traveler’s dilemma: Paradoxes of rationality in game theory. AmericanEconomic Review Papers and Proceedings, 84:391–395.

Becker, E., Marinari, M. G., Mora, T., and Traverso, C. (1994). The shape of the ShapeLemma. In von zur Gathen, J. and Giesbrecht, M., editors, ISSAC ’94: Proceedings of theinternational symposium on symbolic and algebraic computation, pages 129–133. ACM.

Ben-Porath, E. (1997). Rationality, Nash equilibrium and backwards induction in perfect-information games. Review of Economic Studies, 64:23–46.

Benaım, M. and Weibull, J. W. (2003). Deterministic approximation of stochastic evolutionin games. Econometrica, 71:873–903.

Berkemer, R. (2008). Disputable advantage of experience in the travelers’ dilemma. Un-published manuscript, Technical University of Denmark. Abstract in International Con-ference on Economic Science with Heterogeneous Interacting Agents, Warsaw, 2008.

Binmore, K. (1987). Modeling rational players. Economics and Philosophy, 3-4:179–214 and9–55.

Binmore, K. (1998). Game Theory and the Social Contract, Volume 2: Just Playing. MIT Press,Cambridge.

Bjornerstedt, J. and Weibull, J. W. (1996). Nash equilibrium and evolution by imitation. InArrow, K. J. et al., editors, The Rational Foundations of Economic Behavior, pages 155–181.St. Martin’s Press, New York.

–52–

Page 53: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

Brandenburger, A., Friedenberg, A., and Keisler, H. J. (2008). Admissibility in games.Econometrica, 76:307–352.

Buchberger, B. (1965). Ein Algorithmus zum Auffinden der Basiselemente des Restklassenringsnach einem nulldimensionalen Polynomideal. PhD thesis, University of Innsbruck. Trans-lated by M.P. Abramson as “An algorithm for finding the basis elements of the residueclass ring of a zero-dimensional polynomial ideal” in Journal of Symbolic Computation 41(2006), 475–511.

Cardenas, J. C., Mantilla, C., and Sethi, R. (2015). Stable sampling equilibrium in commonpool resource games. Games, 6:299–317.

Cohen, H. (1993). A Course in Computational Algebraic Number Theory. Springer, Berlin.

Collins, G. E. (1975). Quantifier elimination for the theory of real closed fields by cylin-drical algebraic decomposition. In Second GI Conference on Automata Theory and FormalLanguages, volume 33 of Lecture Notes in Computer Science, pages 134–183. Springer,Berlin.

Cox, D., Little, J., and O’Shea, D. (2015). Ideals, Varieties, and Algorithms: An Introductionto Computational Algebraic Geometry and Commutative Algebra. Springer International,Cham, Switzerland, fourth edition.

Cressman, R. (1996). Evolutionary stability in the finitely repeated prisoner’s dilemmagame. Journal of Economic Theory, 68:234–248.

Cressman, R. (2003). Evolutionary Dynamics and Extensive Form Games. MIT Press, Cam-bridge.

Cressman, R. and Schlag, K. H. (1998). On the dynamic (in)stability of backwards induc-tion. Journal of Economic Theory, 83:260–285.

Dekel, E. and Gul, F. (1997). Rationality and knowledge in game theory. In Kreps, D. M.and Wallis, K. F., editors, Advances in Economics and Econometrics: Theory and Applications,volume 1, pages 87–172. Cambridge University Press, Cambridge.

Droste, E., Kosfeld, M., and Voorneveld, M. (2003). Best-reply matching in games. Mathe-matical Social Sciences, 46:291–309.

Dummit, D. S. and Foote, R. M. (2004). Abstract Algebra. Wiley, Hoboken, NJ, third edition.

Friedman, D. and Oprea, R. (2012). A continuous dilemma. American Economic Review,102:337–363.

Gilboa, I. and Matsui, A. (1991). Social stability and equilibrium. Econometrica, 59:859–867.

Guggenheimer, H. W., Edelman, A. S., and Johnson, C. R. (1995). A simple estimate of thecondition number of a linear system. College Mathematics Journal, 26:2–5.

–53–

Page 54: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

Halpern, J. Y. (2001). Substantive rationality and backward induction. Games and EconomicBehavior, 37:425–435.

Hofbauer, J. (1995). Stability for the best response dynamics. Unpublished manuscript,University of Vienna.

Horn, R. A. and Johnson, C. R. (2013). Matrix Analysis. Cambridge University Press, NewYork, second edition.

Izquierdo, L. R., Izquierdo, S. S., and Sandholm, W. H. (2018). An introduction to ABED:Agent-based simulation of evolutionary game dynamics. Unpublished manuscript,Universidad de Burgos, Universidad de Valladolid, and University of Wisconsin.

Jehiel, P. (2005). Analogy-based expectation equilibrium. Journal of Economic Theory,123:81–104.

Kelley, A. (1967a). Stability of the center-stable manifold. Journal of Mathematical Analysisand Applications, 18:336–344.

Kelley, A. (1967b). The stable, center-stable, center, center-unstable, and unstable mani-folds. Journal of Differential Equations, 3:546–570.

Kosfeld, M., Droste, E., and Voorneveld, M. (2002). A myopic adjustment process leadingto best reply matching. Journal of Economic Theory, 40:270–298.

Kreindler, G. E. and Young, H. P. (2013). Fast convergence in evolutionary equilibriumselection. Games and Economic Behavior, 80:39–67.

Kreps, D. M., Milgrom, P., Roberts, J., and Wilson, R. (1982). Rational cooperation in thefinitely repeated Prisoner’s Dilemma. Journal of Economic Theory, 27:245–252.

Kubler, F., Renner, P., and Schmedders, K. (2014). Computing all solutions to polyno-mial equations in economics. In Schmedders, K. and Judd, K. L., editors, Handbook ofComputational Economics, volume 3, pages 599–652. Elsevier, Amsterdam.

Mantilla, C., Sethi, R., and Cardenas, J. C. (2018). Sampling, stability, and public goods.Unpublished manuscript, Universidad del Rosario, Columbia University, and Univer-sidad de Los Andes.

McKelvey, R. D. and Palfrey, T. R. (1992). An experimental study of the centipede game.Econometrica, 60:803–836.

McNamee, J. M. (2007). Numerical Methods for Roots of Polynomials, Part I. Elsevier, Ams-terdam.

Merikoski, J. K., Urpala, U., Virtanen, A., Tam, T.-Y., and Uhlig, F. (1997). A best upperbound for the 2-norm condition number of a matrix. Linear Algebra and its Applications,254:355–365.

–54–

Page 55: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

Osborne, M. J. and Rubinstein, A. (1998). Games with procedurally rational players.American Economic Review, 88:834–847.

Oyama, D., Sandholm, W. H., and Tercieux, O. (2015). Sampling best response dynamicsand deterministic equilibrium selection. Theoretical Economics, 10:243–281.

Perea, A. (2014). Belief in the opponents’ future rationality. Games and Economic Behavior,83:231–254.

Perko, L. (2001). Differential Equations and Dynamical Systems. Springer, New York, thirdedition.

Ponti, G. (2000). Cycles of learning in the Centipede game. Games and Economic Behavior,30:115–141.

Radner, R. (1980). Collusive behavior in noncooperative epsilon-equilibria of oligopolieswith long but finite lives. Journal of Economic Theory, 22:136–154.

Reny, P. J. (1992). Backward induction, normal form perfection, and explicable equilibria.Econometrica, 60:627–649.

Robson, A. and Vega-Redondo, F. (1996). Efficient equilibrium selection in evolutionarygames with random matching. Journal of Economic Theory, 70:65–92.

Rosenthal, R. W. (1981). Games of perfect information, predatory pricing and the chain-store paradox. Journal of Economic Theory, 25:92–100.

Rustichini, A. (2003). Equilibria in large games with continuous procedures. Journal ofEconomic Theory, 111:151–171.

Samuelson, L. (1992). Dominated strategies and common knowledge. Games and EconomicBehavior, 4:284–313.

Sandholm, W. H. (2001). Almost global convergence to p-dominant equilibrium. Interna-tional Journal of Game Theory, 30:107–116.

Sandholm, W. H. (2003). Evolution and equilibrium under inexact information. Gamesand Economic Behavior, 44:343–378.

Sandholm, W. H. (2007). Evolution in Bayesian games II: Stability of purified equilibria.Journal of Economic Theory, 136:641–667.

Sandholm, W. H. (2010a). Pairwise comparison dynamics and evolutionary foundationsfor Nash equilibrium. Games, 1:3–17.

Sandholm, W. H. (2010b). Population Games and Evolutionary Dynamics. MIT Press, Cam-bridge.

–55–

Page 56: Best Experienced Payo Dynamics and Cooperation in the ...luis.izqui.org/papers/bepc.pdf · Best Experienced Payo Dynamics and

Sandholm, W. H. (2015). Population games and deterministic evolutionary dynamics. InYoung, H. P. and Zamir, S., editors, Handbook of Game Theory, volume 4, pages 703–778.Elsevier, Amsterdam.

Sandholm, W. H., Izquierdo, S. S., and Izquierdo, L. R. (2018). Stability for best experiencedpayoff dynamics. Unpublished manuscript, University of Wisconsin, Universidad deValladolid, and Universidad de Burgos.

Sethi, R. (2000). Stability of equilibria in games with procedurally rational players. Gamesand Economic Behavior, 32:85–104.

Sijbrand, J. (1985). Properties of center manifolds. Transactions of the American MathematicalSociety, 289:431–469.

Stalnaker, R. (1996). Knowledge, belief, and counterfactual reasoning in games. Economicsand Philosophy, 12:133–163.

Taylor, P. D. and Jonker, L. (1978). Evolutionarily stable strategies and game dynamics.Mathematical Biosciences, 40:145–156.

von zur Gathen, J. and Gerhard, J. (2013). Modern Computer Algebra. Cambridge UniversityPress, Cambridge, third edition.

Weibull, J. W. (1995). Evolutionary Game Theory. MIT Press, Cambridge.

Xu, Z. (2016). Convergence of best-response dynamics in extensive-form games. Journalof Economic Theory, 162:21–54.

Young, H. P. (1993). The evolution of conventions. Econometrica, 61:57–84.

Young, H. P. (1998). Individual Strategy and Social Structure. Princeton University Press,Princeton.

–56–