# arXiv:1505.02592v1 [nlin.AO] 11 May 2015 ?· Department of Mathematics, University of Portsmouth, Portsmouth,…

Post on 21-Aug-2018

212 views

Embed Size (px)

TRANSCRIPT

<ul><li><p>arX</p><p>iv:1</p><p>505.</p><p>0259</p><p>2v1 </p><p> [nl</p><p>in.A</p><p>O] </p><p> 11 </p><p>May</p><p> 201</p><p>5</p><p>Does good memory help you win games?</p><p>James BurridgeDepartment of Mathematics, University of Portsmouth, Portsmouth, PO1 2UP, UK</p><p>Yu Gao and Yong MaoSchool of Physics and Astronomy, University of Nottingham, Nottingham, NG7 2RD, UK</p><p>(Dated: May 12, 2015)</p><p>We present a simple game model where agents with different memory lengths compete for finiteresources. We show by simulation and analytically that an instability exists at a critical memorylength, and as a result, different memory lengths can compete and co-exist in a dynamical equilib-rium. Our analytical formulation makes a connection to statistical urn models, and we show thattemperature is mirrored by the agents memory. Our analysis is easily generalisable to many othergame models with implications that we briefly discuss.</p><p>PACS numbers: 02.50.Le, 89.75.Fb, 05.65.+b</p><p>Introduction All successful forms of life must eventu-ally engage in competition for resources. The equilibriumanalysis of these competitions began with von Neumann[1] and Nash [2]. The theory of games has since found ap-plications in genetics, ecology, economics and sociology[36]. Computational implementation of games leads toagent-based models, which may be of particular impor-tance in understanding the behaviour of financial systems[6, 7]. For example, the particularly successful minoritygame model [811], captures the competition between in-telligent agents with a restricted form of memory. Recentwork suggest such games may be generalised leading toclearly separated regimes of behaviour [12]. In general,understanding the complex collective behaviour arisingfrom the non-linear interactions between individuals is amajor challenge for statistical physics [13, 14].</p><p>In this Letter we present a simple game model: in eachround, individual agents pick one of two urns, each of-fering a stochastic yield to be shared by the pickers. Anagent has a memory of these payouts for the previous rounds to aid its decision. Such stochastic yield shar-ing arises in animal foraging behaviour and stock trad-ing [15], where the urns may represent different preyspecies, foraging patches, or stocks. Some form of in-telligence is essential in order to compete [16]. As withthe minority game [8], agents memory in our model isa tool for decision making. However, unlike the minor-ity game, memory in our model is used to make directestimates of the highest paying choice, rather than tosecond guess opponents next moves. In common withthe thermal minority game [10], dynamical urn models[18], and some evolutionary games [19], our agents havea temperature, which captures the level of noise in theswitches they make in search of yields. We find that theadditional noise inherent in their finite memory samples,which is greater for shorter memories, leads the system tobehave as if its agents have a higher temperature. There-fore, increasing memory cools the system. However, ata critical memory, a Hopf bifurcation [20] emerges pro-ducing stable cycles in the numbers of agents in eachurn. Perhaps not surprisingly, a long memory is advan-</p><p>tageous, but the presence of these cycles allows shortmemory agents to compete, and a mixed memory systemwill evolve toward the bifurcation point. Our theoreticalformulation follows that of statistical urn models suchas Ehrenfests dog flea model [21], which played an im-portant role in the early development of statistical me-chanics, and more recently allowed analytical investiga-tion of effects such as slow relaxation and condensationin nonequilibrium statistical mechanics [18].</p><p>Whilst we have restricted this Letter to a yield sharinggame, the analysis may be carried through for any so-cial system where agents switch between strategies usingtheir memory to determine the optimal choice. Simpleexamples include the Hawk Dove [3] and Rock ScissorsPaper [22] games. Our formulation could also accom-modate the inclusion of topological effects and agent in-teractions [23]. However, even without such complexities,urn models exhibit a remarkable range of non-equilibriumbehavior that is connected with temperature [18]. Ouranalysis suggests that effects such as condensation andthe emergence of order [24], which have social interpre-tation, may also have a connection with memory [23], butthat long memory can also introduce instability.</p><p>Model definition Consider the case of two urns anda total of n agents. We let the urn yields, U1(t), U2(t),at round t be random variables uniformly distributed on[0, n] and [0, n] respectively, where > 1 so that urn1 yields more on average than urn 2. We allow agentsaccess to the arithmetic mean of the last payoffs, but wenote that other forms of sampling could be used. Lettingt be the fraction of agents in urn 1, then the differencein the average payoffs between urn 1 and urn 2 is</p><p>t :=1</p><p>1</p><p>s=0</p><p>[</p><p>U2(t s)n(1 ts)</p><p> U1(t s)nts</p><p>]</p><p>. (1)</p><p>We refer to as the memory length of the agents.Agent dynamics is encoded in transition probabilities be-tween urns, which are deterministic functions of t. Ateach round, each agent will switch urns using the proba-</p><p>http://arxiv.org/abs/1505.02592v1</p></li><li><p>2</p><p>0.0 0.5 1.0 1.5 2.0 2.5 3.00.60</p><p>0.62</p><p>0.64</p><p>0.66</p><p>0.68</p><p>0.70</p><p>t</p><p>t</p><p>FIG. 1. Evolution of t when n = 106, = 2, = 10 and</p><p> = 106. Memory values are {5, 10, 50, 500} (squares,circles, dots, triangles). Dashed lines are analytical equilib-rium values (see equation (12)).</p><p>bilities</p><p>W12() =</p><p>2[1 + tanh()] (2)</p><p>W21() =</p><p>2[1 tanh()] . (3)</p><p>The parameter , the inverse temperature, capturesthe degree of stochasticity with which agents makechoices. For finite , agents may decide to switch strate-gies even though their estimate of the payoff difference isunfavourable. In the limit agents will only moveif their estimate of the payoff difference indicates thatthe move is favourable. The parameter controls therate at which strategy switching takes place comparedto the rate at which yield information arrives. It mayalso be seen as the frequency with which opportunitiesto switch strategy arise. In the limit 0, at most oneagent will move at each round.Simulation (instability) We simulate the model for a</p><p>series of values of when n = 106. Two different valuesof are used; in Figure 1 we have 1 = 106 and inFigure 2 we have = 103. For = 106, the expectednumber of moves at each step is < 1, and appears verystable. For larger , experiences much larger fluctua-tions about the steady state value, driven by the yieldprocess. For shorter memory values these fluctuationsare random, but as approaches 1, periodic oscilla-tions appear and dominate. The appearance of thesestable oscillations at critical memory, c, is known as aHopf bifurcation [20]. By allowing agents access onlyto the mean of their memory, we implicitly assume thatchanges in the expected payoff over the course of theirmemory, brought about by oscillations, are too subtlefor them to infer from noise.Simulation (coexistence) We now investigate how</p><p>agents with two different memories compete against oneanother by interpreting the payoff as reproduction rate.We define and as the rates of death, and reproduction</p><p>0 1 2 3 4 50.55</p><p>0.60</p><p>0.65</p><p>0.70</p><p>0.75</p><p>t</p><p>t</p><p>FIG. 2. Evolution of t when n = 106, = 2, = 5</p><p>and = 103. Memory values are {5, 50, 500} (triangles,squares, circles). Dashed line is solution to equation (15)when = 500 and , , are as above.</p><p>per unit payoff, respectively. Reproduction is assumedto occur before death in each round, but in practice theprobability of any one agent reproducing and dying inthe same round is extremely small for the , values wechoose. Letting pi (t) be the number of agents with mem-ory in urn i at time t we set the probability of birth foran agent in urn i to be</p><p>P(birth) =Ui(t)</p><p> pi (t)</p><p>. (4)</p><p>The death probability for each agent is set equal to .If populations are fixed in size and the system is notin an oscillatory state, then we expect that in equilib-rium the longer memory agents will dominate the highyielding urn. Their long memory allows them to per-ceive smaller statistical advantages that are obscured bynoise for the short memory agents. Using the thermody-namic analogy, the higher temperature (shorter memory)agents are more likely to make moves which leave themin an urn with a lower expected payoff, corresponding toa higher energy state. Above zero temperature, andin the absence of oscillations, the high yield urn will beunder-exploited, placing high memory agents at an ad-vantage. This effect can be observed in Figure 3 wherewe have simulated a mixed population of two memories {10, 1000} beginning with a ratio of 10:1 short mem-ory to long memory agents. We see that initially theadvantage afforded the long memory agents causes theirpopulation to grow, whereas the short memory agents re-duce in number. Were this advantage to be sustained in-definitely then we would expect the short memory agentsto eventually disappear, but in fact the populations stabi-lize. This effect appears because the long memory agentscause oscillations to develop once they are in sufficientlyhigh concentration. In the presence of oscillations theshort memory agents have an advantage because they canquickly observe opportunities offered by the oscillating</p></li><li><p>3</p><p>0 1000 2000 3000 4000 50000.0</p><p>0.2</p><p>0.4</p><p>0.6</p><p>0.8</p><p>1.0</p><p>-5</p><p>-4.5</p><p>-4</p><p>-3.5</p><p>-3</p><p>t</p><p>n-1p t</p><p>Log@</p><p>E@</p><p>2D-</p><p>E@D2D</p><p>FIG. 3. Scaled populations p(t) := p1(t) + p</p><p>2(t) for = 10(circles) and = 1000 (triangles) when n = p(0) = 106, = 103, = 4 with = 104 and = 2 104. Alsoshown (thin black line) is evolution of variance of t duringpopulation dynamics simulation. Straight dashed line showsvariance of homogeneous population with the same , , val-ues at critical memory c. Note: rapid initial equilibration ofpopulation values (bringing birth and death into balance) isnot visible on time scale of plot.</p><p>payoffs. We therefore expect the system to evolve to thepoint where oscillations are just beginning to form. Wemay observe this evolution by making use of the varianceof t as an order parameter which captures proximity tothe Hopf bifurcation point. In Figure 3 we see that at acritical ratio of short to long memory agents, the varianceclimbs rapidly, stabilizing just below the value seen in asystem where all agents have memory c but all otherparameters are equal. In this way the Hopf bifurcationmay be viewed as a self organized state.Analysis (equilibrium) We consider the behaviour of</p><p>the model as 0, allowing us to view it as an urn modelin the Ehrenfest class [18] where agents independentlymake transitions using state (t) dependent probabilities.Provided 1, the fraction t may be approximatedby a constant during the window over which payoffaveraging takes place. In this case, by the central limittheorem, the marginal distributions of t for each t areapproximately normal N(, 2/) where, from (1)</p><p>(, ) :=1</p><p>2</p><p>(</p><p>1</p><p>1 </p><p>)</p><p>(5)</p><p>2(, )</p><p>:=</p><p>1</p><p>12</p><p>[</p><p>2</p><p>2+</p><p>1</p><p>(1 )2]</p><p>. (6)</p><p>We now introduce a intermediate time scale T satisfying T 1 and define the time average , over awindow of length T</p><p>Wij()(t) :=1</p><p>T</p><p>t</p><p>s=tT+1</p><p>Wij(s). (7)</p><p>This average is a random variable which, for constant ,has expected value E[Wij()] where the expectation is</p><p>taken over the marginal distribution of . The condition T 1 ensures that is approximately constantover the window and that the variance of Wij() isproportional to T1 (because t1 and t2 are depen-dent only when |t2 t1| < T ). As 0, thenassuming T is sufficiently large, the probability that anagent will make a transition i j during interval Tapproaches T Wij() TE[Wij()], equivalent toa memoryless (Ehrenfest class) model where transitionprobabilities (2,3) are replaced with their expectationsE[Wij()]. Averaging over the normally distributeddifference we find that</p><p>W12() E[W12()] </p><p>2</p><p>[</p><p>1 + tanh()]</p><p>(8)</p><p>where</p><p> =</p><p>22</p><p>2 + 22. (9)</p><p>To obtain this result, we have made the approximationtanh() erf(/2), allowing us to make use ofthe exact relationship E[erf(</p><p>/2)] = erf(</p><p>/2).</p><p>The constant acts as an effective inverse tempera-ture and we see that increasing cools the systemcloser to the inverse temperature , and in the limit , . To complete our analogy to a ther-mal urn model we now write the probability of findingthe agents in a particular arrangement, or microstate, i,such that a fraction are in urn 1, as pi() eEwhere E is an energy function. Considering two mi-crostates separated by a single transition, and defining = 1/n, then detailed balance requires that in equilib-rium 2 = (E). This condition allows E() tobe computed, in principle, by integration. A closed formapproximation E() n ln [(1 )] is obtained bynoting that depends weakly on compared to E sothat (E) E. Summing over all microstates cor-responding to macrostate we have a Boltzmann prob-ability distribution for </p><p>P() =n!</p><p>(n)!(n(1 ))!e()E()</p><p>Z , (10)</p><p>where Z is the partition function. Taking the thermody-namic limit n , and making use of Stirlings approx-imation, we find that the most likely (maximum entropy)fraction, , satisfies:</p><p>1</p><p>2n</p><p>lnP() = 2+ 1 = 0. (11)</p><p>As the memory increases and the system cools we ex-pect the agents to arrange themselves so that yields areshared more fairly. We therefore linearize (11) about theperfectly fair state, = /(1 + ), where agents in bothurns receive the same expected payoff, finding that</p><p> f() +(1+)2</p><p>2</p><p>2f() + (1+)3</p><p>2</p><p>, (12)</p></li><li><p>4</p><p>where f() =</p><p>1 + 2(1 + )2/(12). The accuracyof this approximation is verified in Figure 1. For largervalues of (Figure 2) agents move more quickly so theaveraging effect (8) damps fluctuations in transition ratesless strongly, creating larger fluctuations in t. For finite the system cannot reach perfect fairness for any mem-ory length, but in the limit where the transitionprobabilities (2,3) become step functions, we have that:</p><p> + 1</p><p>[</p><p>1( 1)3( + 1)2</p><p>+O(1)]</p><p>. (13)</p><p>From this we see that the distance away from the fairstate decreases as 1/2 as the memory of the agentsbecomes large. However, we now show why increasing too far, when is finite, destabilizes the system.</p><p>Analysis (instability) As increases, fluctuations int due to the yield process are reduced but for finite wecan no longer treat t as a constant over the averagingwindow. It is instructive, therefore, to study the effect ofvariations in t, neglecting the variations in yield. Pro-moting t to a continuous variable and replacing the urnyields with their mean values we have</p><p>t 1</p><p>2</p><p> t</p><p>t</p><p>[</p><p>1</p><p>1 s s</p><p>]</p><p>ds. (14)</p><p>We then approximate the evolution of t using the fol-lowing delay differential equation:</p><p>t = (1 t)W21(t) tW12(t). (15)</p><p>A numerical solution to this equation is shown in Figure2, along with simulation results using the same parametervalues. The oscillations in the simulation are accuratelycaptured by (15) but the stochastic yield disrupts theirperfect periodicity. To discover the parameter values atwhich stable oscillations develop we linearize equation(15) by writing t = + t where t are small fluctu-ations and is the constant fixed point, not necessarilystable, of equation (15)....</p></li></ul>