markets in equilibrium with firms out of equilibrium: a simulation study

Journal of Economic Behavior & OrganizationVol. 65 (2008) 261–276

Markets in equilibrium with firms out ofequilibrium: A simulation study

Marco Casari ∗Purdue University, Krannert School of Management, 403 West State Street,

West Lafayette, IN 47907, United States

Received 19 December 2002; accepted 18 January 2006Available online 18 September 2006

Abstract

We explore the effect of the limited ability to process information on the convergence of firms towardequilibrium. In the context of a Cournot oligopoly with a unique and symmetric Nash equilibrium, firmsare modeled as adaptive economic agents through a genetic algorithm. Computational experiments showthat while market production is close to equilibrium, firm production is relatively far from the individualequilibrium level. This pattern of firm heterogeneity is not an artifact of random elements built into thedecisional process. Instead, it comes from the market interaction of firms with cognitive limitations.© 2006 Elsevier B.V. All rights reserved.

JEL classification: C72; D83; C63

Keywords: Cournot oligopoly; Bounded rationality; Genetic algorithms; Individual heterogeneity

Many experimental studies have documented that when subjects are given identical monetaryincentives they often choose different actions. We mention three instances among many. Bossaertand Plott (2000) study financial markets where individual portfolio holdings are predicted tobe identical and report persistent individual differences. In the appropriation of a common-poolresource, individual choices are remarkably different one from another. Ostrom et al. (1994)carefully document this pattern. While reviewing voluntary public good contribution experiments,Ledyard (1995) poses as a puzzle the wide heterogeneity of individual contributions.1

∗ Tel.: +1 765 494 3598; fax: +1 765 496 1567.E-mail address: [email protected].

1 See Bossaert and Plott (2000, Fig. 16), Ledyard (1995, pp. 170–173). Other studies have reported a similar pattern;see for instance Palfrey and Prisbrey (1997), Saijo and Nakamura (1995), Casari and Plott (2003, Fig. 5).

0167-2681/$ – see front matter © 2006 Elsevier B.V. All rights reserved.doi:10.1016/j.jebo.2006.01.003

mailto:[email protected]

dx.doi.org/10.1016/j.jebo.2006.01.003

262 M. Casari / J. of Economic Behavior & Org. 65 (2008) 261–276

We claim that a powerful source of such individual heterogeneity could be the limited abilityof agents to process information. To support this claim, simulation results are presented in aCournot oligopoly where firms are modeled as adaptive economic agents with limited knowledgeof the task and limited memory. These firms experiment with new strategies, and they learn fromexperience. We implement an evolutionary approach through a genetic algorithm, where firms areidentical in their level of bounded rationality and where the equilibrium discovery process has arandom element.

The strategic environment adopted is simple and exhibits a unique symmetric equilibrium thatmakes it hard for firm heterogeneity to persist, yet, simulation results show that boundedly rationalalthough identical firms are heterogeneous in their strategy choice. In order to understand the forcesthat generate this pattern, an extensive sensitivity analysis was performed in two dimensions:degree of noise and rationality. Contrary to what one might expect, we show that the heterogeneityresult is not simply a consequence of the random elements contained in the genetic algorithm.Moreover, with a rise in the memory capabilities and in the ability to evaluate potential strategies,individual differences decline. In the limit case of full rationality, there is a convergence towardthe canonical result of uniform individual behavior.

The main goal of the paper is to use genetic algorithm firms to replicate some qualitativefeatures in the experimental literature. In addition, an in-depth analysis of genetic algorithms aseconomic models is provided.2 Genetic algorithms have been used in economics as a black box tomodel boundedly rational agents. This paper goes beyond that by assessing the impact of severalkey parameters in the model and showing the interplay between random search and degree ofrationality. A major point is that for a large class of genetic algorithm firms, the discovery of themarket equilibrium is much easier than the discovery of the individual equilibrium strategy.

The paper is organized as follows. The Cournot model and simulations parameters are out-lined in Section 1. The decision-making process of the individual-learning genetic algorithm isexplained in Section 2. The main result regarding individual heterogeneity is in Section 3, alongwith the discussion on the random element. In Sections 4 and 5 we explore changes in rational-ity levels with respect to pre-play evaluation of new strategies and by varying working memoryconstraints. Conclusions are in Section 6.

1. The Cournot model

The strategic environment is a standard Cournot oligopoly game, Γ (N, (Si)i∈N, (πi)i∈N). A firm iproduces quantity xi ∈ [0, λ]. All N firms simultaneously choose a production level, and then a mar-ket price pr is determined through the clearing of market demand and supply. The inverse demandfunction is pr(X) = d − bX, where X = ∑N

i=1xi and d, b > 0, and the cost function is c(xi) = hxi,which is linear and identical for all firms. Hence, the profit function is πi = xi(d − bX) − hxi. Firmsface the same incentive structure for T interactions without carry-over from one period to the next.

The Nash equilibrium of the game for profit maximizing firms is X∗ = (1/(N + 1))((d −Nh)/b) and pr∗ = (1/(N + 1))(d − Nh). The game has a continuous strategy space and a unique,symmetric, and evolutionary stable Nash equilibrium. In other words, this game provides idealconditions to facilitate convergence toward the Nash equilibrium outcome both at the aggregateand individual levels. The parameter values adopted are N = 8, λ = 50, h = 5/2, d = 23/2, b = 1/16,

2 A full description of the working of a genetic algorithm (GA) is given in the textbooks of Holland (1975), Goldberg(1989), and Mitchell (1996). For issues specific to Economics see the excellent study of Dawid (1996).

M. Casari / J. of Economic Behavior & Org. 65 (2008) 261–276 263

which yield Nash equilibrium values of X* = 128, x∗i = 16∀i, and pr* = 7.5.3 The conclusions that

we will draw are largely independent of these specific numerical values. At the Nash equilibrium,industry profits are less than monopoly profits; in particular, earnings are 39.5 percent of monopolyprofits. In the Cournot setting adopted here, price expectations for the current period are equal tothe price in the last period adjusted by variations due to changes in the quantity that the firm itselfis going to produce, pre

t = pret−1 − b(xi,t − xi,t−1).

2. The genetic algorithm firms

2.1. General description

This section presents the genetic algorithm (GA) decision makers employed in the simulations.The model has some interesting economic interpretations because it allows for individual learningand it is ordinal in the payoff function. Individual learning GA agents (multi-population) learnfrom their own experience, in contrast with social learning GA (single-population) where agentslearn from other agents’ experiences (Dawid, 1999; Holland and Miller, 1991; Vriend, 2000; Chenand Yeh, 2001). The social learning architecture has been successfully employed in the agent-based computational literature in economics to study aggregate behavior (Bullard and Duffy,1998; Miller, 1996; Arifovic, 1996; Nowak and Sigmund, 1998; Georges, 2006); however, theindividual learning architecture is better suited for this study because we focus on the individualbehavior of the agents (Andrews and Prager, 1994; Arifovic, 1994; Chen and Yeh, 1998).

The decision process can be described as follow. The strategy that agents have to choose isidentified by a single real number. It is encoded as a binary string, a so-called chromosome, andhas associated with it a score (measure of fitness) that derives from the actual or potential payofffrom this strategy. In a social learning (single-population) basic GA, each agent has just onestrategy (chromosome) available, which may change from one period to the next. The changesare governed by three probabilistic operators: a reinforcement rule (selection), which tends toeliminate strategies with lower score and replicate more copies of the better performing ones;crossover, which combines new strategies from the existing ones; mutation, which may randomlymodify strategies.4 In a basic GA, the strategies (chromosomes) created by crossover and mutationare directly included in the next period’s set of strategies (population).

The three operators are stylized devices that are meant to capture elements involved in humanlearning when agents interact. The reinforcement rule (selection) represents evolutionary pressurethat induces agents to discard bad strategies and imitate good strategies, crossover represents thecreation of new strategies and the exchange of information, and mutation can bring new strategiesinto a range that have not been considered by the agents.

2.2. Operators

Five aspects of this framework are discussed in more detail: the memory set, the reinforcementrule, the choice rule, the ordinal nature of the GA, and the built-in random elements.

3 Notice that the Nash equilibrium outcome is not positioned in the center of the action space (i.e. at 200) so that itwould not be reached through pure chance.

4 The crossover operator first randomly selects two strategies out of a population; second, it selects at random an integernumber w from [1, L − 1]. Two new strategies are formed by swapping the portion of the binary string to the right of theposition w.


When agents do not consider just one strategy at each period in time, but have a finite collectionof strategies from which one is chosen in every period (memory set), the process is called a multi-population GA. A strategy is a real number aikt ∈ [0, 50] that represent the production level of firmi in period t. Each agent is endowed with an individual memory set Ait = {ai1t, . . ., aiKt} composedof a number of strategies K that is constant over time and exogenously given. If a strategy aikt isin the memory set, agent i can choose it for play at time t (i.e. it is available). When there existmore than K strategies in the game, there are always strategies that are not currently available inthe memory set. Notice that an available strategy has no impact on the outcome unless it is chosenfor play. A score is assigned to every strategy in the memory set, whether the strategy was chosento be played or not.

The size of the memory set, K, is a measure of the level of sophistication of an agentsince it determines how many strategies an agent can simultaneously evaluate and remem-ber. Psychology literature has pointed out that working memory has severe limitations in thequantity of information that it can store and process. According to these findings, the mem-ory limitation is not just imperfect recall from one round to the next, but rather an inability tomaintain an unlimited amount of information in memory during cognitive processing (Miller,1956; Daily et al., 2001). By setting K = 6 we assume that decision-makers have a hardwiredlimitation in processing information at six strategies at a time. The classic article by Miller(1956) stresses the “magic number seven” as the typical number of units in people’s workingmemory.5

The reinforcement rule (selection) is a pairwise tournament repeated K times, R: A(K) → A(K),which is applied separately to each agent’s memory set: (1) at time t two strategies, aikt and aiqt,are randomly drawn with replacement from the memory set Ait and (2) the strategy with thehighest score in the pair is placed in the new set: ai·t+1 = argmax{s(aikt), s(aiqt)}; (3) the previoustwo operations are performed K times in order to generate a complete memory set for agent i attime t + 1, Ai,t+1. Agents are adaptive learners in the sense that successful strategies are reinforced.Strategies that perform well or that would have performed well if employed over time graduallyreplace poor-performing ones. With experience, the composition of the memory set becomes thedistilled wisdom of past decisions and past outcomes.

As each agent has multiple strategies (a memory set), there is an additional issue of how tochoose a strategy to play out of the K available. The choice rule, C: A(K) → A, is a stochasticoperator that works as a one-time pairwise tournament, where (1) two strategies, aikt and aiqt,are randomly drawn with replacement from the memory set Ait and (2) the strategy with thehighest score in the pair is chosen to be played: a∗

it = argmax{s(aikt), s(aiqt)}. A pairwise tour-nament is different from deterministic maximization because the best strategy in the memoryset is picked with a probability less than one. The choice rule, however, is characterized by aprobabilistic response that favors high-score over low-score available strategies. This rule mod-els an imperfect ability to find an optimum, where the probability of a mistake is related to itscost.

An interesting feature of the adopted GA is that all operators are ordinal in the payoff functionin the sense of relying only on “greater than” comparisons among strategies. An ordinal operatordoes not rely on a biological interpretation of the score as a perfect measure of the relativeadvantage of one strategy over another (Proposition 1).

5 The memory set size K needs to be even, so it could have been set to 8. There is debate in the psychological literatureabout what constitutes an unit when counting to 7. In this specific application it seems reasonable to identify a singlestrategy as a unit.


Proposition 1 ((Ordinality)). The results of the GA firms’ interactions are unaffected by anystrictly increasing transformation v: R → R of the score function.

The score is the index of performance for a strategy aikt and is a function of the monetarypayoff π, s(aikt) = v[p(aikt, a−it)]. The score of a strategy can be interpreted as the utility of theoutcome associated with that strategy.

2.3. Exogenous randomness

There are two sources of external noise built into the GA firms: the random initialization ofthe strategies and the experimentation process. In the first period of interaction a firm is endowedwith a set of potential strategies chosen at random. One could interpret these strategies as priors,which are set equal neither across firms nor at the equilibrium level. Later we will compare theresults with a GA with common priors set at equilibrium level.

The experimentation processes works through the random modification of existing strategiesat a given rate. This rate is generally measured by the mutation rate, pm. In the GA a strategyis coded as a binary string of 0 s and 1 s and there is a probability pm ∈ (0, 1) that each digit‘0’ flips to ‘1’ or vice versa. Given a mutation rate one can calculate the expected percentageof strategies that will randomly change in a period (which we call innovation rate, p) usingthe formula p = 1 − (1 − pm)L, where L is the number of digits of the binary string. In severalsimulations in Table 2 we adopted a mutation rate pm = 0.02, which translates into an innovationrate of p = 0.1492. Is this innovation rate “high” or “low”? Four considerations are in order. First,its value is in the range of values found in the literature, spanning from p = 0.001 (Nowak andSigmund) to p = 0.64 (Bullard and Duffy). Second, the rate is a measure of change in potentialstrategies but not necessarily in chosen strategies. While the two concepts are identical in sociallearning GA, they are not in individual learning GA. In the latter, the probability that a strategy isimplemented depends on how “promising” it is, and a new strategy may never be actually chosenas action (Proposition 2).

Proposition 2 (Choice rule). The probability that an available strategy x is chosen for play,x* = x, by a pairwise tournament choice rule out of a set A of K available strategies isP{x* = x}= 2rx − 1/K2, where rx is the ranking of the available strategy x within the set A (theworst available strategy ranks 1, rx = 1, and there is an assumption that there are no ties).

When firms are close to equilibrium the actual impact of the experimentation process is smallerthan the innovation rate may suggest. Consider, for example, a firm with a memory of six identicalstrategies (K = 6) that are best responses. A random strategy that replaces an existing one has aprobability just 1/36 of being chosen. Third, a sensitivity analysis is performed in the next sectionon a range of innovation rates p from 0.01 to 0.99, which allows the reader to assess the impact ofthe level of experimentation. Fourth, the actual innovation rate should actually be chosen to matchthe experimentation propensity in human agents and not the need of converge to equilibrium. Wepresent the results for various levels of innovation and leave this calibration exercise to otherstudies.

3. Firm heterogeneity

The same level of aggregate variability can hide widely different patterns of individual variabil-ity. The following example illustrates which dimension matters for analyses. Consider scenariosA and B in Table 1 with two firms and four periods.


Table 1Examples of two patterns of individual variability

Scenario Firm Period Firmaverage, xi

Indexes of variability of firm production

1 2 3 4 OverallD1

OverallS.D.1

AcrossfirmsD2

AcrossfirmsS.D.2

OvertimeS.D.3

Ax1 6 6 6 6 6

14 7.48 14 9.90 0x2 20 20 20 20 20

Bx1 6 20 6 20 13

14 7.48 0 0 8.08x2 20 6 20 6 13

Note—D: difference between maximum and minimum, S.D.: standard deviation.

Scenario A rates highly in terms of variability across firms, referred to here as “high individualheterogeneity,” while scenario B rates highly in terms of variability over time but exhibits “noindividual heterogeneity.” The two scenarios are identical when considering either aggregateproduction Xt = ∑

ixit or overall variability of individual actions. Examples of the latter are themean of the difference, period by period, between the maximum and minimum firm productions,D1 = (1/T )

∑Tt=1maxi{xit} − mini{xit}, or the standard deviation of individual actions xit(S.D.1).

The differences in the patterns of individual variability between scenario A and B can be capturedby splitting the overall individual variability into variability across firms (D2 or S.D.2) and overtime (S.D.3). In order to calculate firm-specific variability, first we compute the average firmproduction over time xi = (1/T )

∑Tt=1xit and, using those data, compute the difference D2 =

maxi{xi} − mini{xi} and the standard deviation for xi (S.D.2). The statistics just described areapplied to the simulation data (Table 2).

Simulation result 1 (Firms’ heterogeneity). In a Cournot oligopoly boundedly rational firms(multi-population genetic algorithms) with identical goals and identical skills make choices thatare persistently heterogeneous.

Support for Result 1 can be found in column (2) of Table 2. At the individual level, GA firmsdo not converge to the symmetric Nash equilibrium outcome of x∗

i = 16 (Table 2, column (1)).The standard deviations of individual production averages is S.D.2 = 3.68. The index D2 is 11.08,which constitutes 69 percent of the Nash equilibrium outcome and 22 percent of the range ofthe individual strategy space. Longer simulations do not substantially alter the conclusion. After20,000 iterations instead of 400, the index values are D2 = 13.12, S.D.2 = 2.78, market productionis 128.81, and its standard deviation is 9.69.

In order to have a benchmark for the random elements built into the GA, the outcome can becompared with the result of interactions among zero intelligence agents ((2) versus (9) in Table 2).Zero intelligence agents are designed in the spirit of Gode and Sunder (1993) and are essentiallypure noise,6 as the individual strategy for each firm is drawn from a uniform distribution on thestrategy space [0, 50] and then aggregated to compute market production and price. In simulations,these agents fail to reach market equilibrium. More importantly, zero intelligence agents generateless individual heterogeneity than GA firms. While scoring much higher in terms of overallvariability (D1ZI = 39.09 vs. D1GA = 15.06), zero-intelligence agents are characterized by half asmuch individual heterogeneity as GA firms (D2ZI = 4.17 vs. D2GA = 11.08, S.D.2ZI = 1.41 versus

6 In Gode and Sunder they are subject to a budget constraint as well.

M.C

asari/J.ofEconom

icB

ehavior&

Org.65

(2008)261–276

267Table 2Simulation results

(1) Nashequilibrium

(2) Fresh score(K = 6,p = 0.15)

(3) Fresh score(K = 6,p = 0.01)

(4) Withequilibriuminitializaiton

(5) Tremblinghand (K = 6,p = 0.15)

(6) Election(K = 6,p = 0.15)

(7) Fresh score(K = 2,p = 0.15)

(8) Fresh score(K = 90,p = 0.15)

(9) Zerointelligenceagents

Market resultsProduction 128 129.18 129.25 128.67 130.54 128.55 135.49 127.92 199.58Standard deviation of production 0 9.89 1.79 9.37 14.84 0.11 16.30 4.43 41.23Price 3.5 3.41 3.42 3.46 3.34 3.47 3.01 3.48 -0.98Standard deviation of price 0 0.65 0.13 0.59 0.93 0.01 1.03 0.37 2.56Profits (percentage of monopoly

profits) (percent)39.5 34.97 36.60 36.30 29.59 38.29 16.48 39.27 −246.95

Individual agent results (1 obs = production decision for one agent at time t)MIN1 (minimum production

across agents (average acrossruns and periods))

16 9.93 10.40 12.90 8.18 12.33 3.68 13.93 5.44

MAX1 (maximum productionacross agents)

16 24.99 24.63 19.48 26.98 24.24 34.52 17.65 44.53

D1 (difference) 0 15.06 14.24 6.57 18.80 11.91 30.84 3.72 39.09

Individual heterogeneity (1 obs = average production for the same agent over τ periods)MIN2 (minimum production

across agents (average acrossruns))

16 11.73 10.31 10.79 11.85 12.33 8.66 14.51 22.82

MAX2 (maximum productionacross agents)

16 22.81 24.71 22.22 22.31 24.24 27.23 16.85 26.99

D2 (difference) 0 11.08 14.40 11.42 10.47 11.91 18.57 2.35 4.17S.D.2 (standard deviations of

individual production)0 3.68 5.28 2.10 3.48 4.43 6.57 0.78 1.41

Individual variability over time (1 obs = S.D. for one agent over τ periods)S.D.3 (standard deviations of

individual production)0 3.51 0.38 3.35 5.46 0.04 8.16 1.31 14.51

Notes: Unless otherwise noted the GA model has the following parameter values, N (number of firms) = 8, L (string length in bits) = 8, pc (crossover probability) = 0.30, pm

(mutation probability) = 0.02, and K (memory size) = 6. The statistics are computed on periods 301–400 and are averages over 100 runs with different random seeds 0.005–0.995.A single run consists of 400 periods of iterations among the agents. In the Fresh Score GA, all new strategies are assigned a potential score based on the previous periodinteraction before applying the choice rule (GA v.7.5, used in (2) (3) (7) and (8)); pm = 0.001256 in (3). In (4) the initial strategies are at 16 (grid point 82, with a downshift of0.07843 to include the exact NE); Fresh score design. In (5) the GA has a trembling hand design, i.e. a new strategy keeps the score of its parent strategy. The “parent” strategyis the original strategy before the mutation happened or, in case of crossover, the strategy that has determined the highest bits in the binary string of the new one (GA v.7.6). In(6) the election operator screens each new strategy before it is permitted to become an available strategy for play; the new strategy replaces its parent strategy in the memory setonly if its potential score improves its parent’s (potential or real) score. If the score of the new strategy is lower than its parent strategy’s score, the parent strategy remains in thememory set (GA v.7.7). In (9) with zero intelligence agents, individual actions are drawn with replacement from a uniform distribution on [0, 50] (GA v.7.5.1).


Fig. 1. Random initialization and Fresh score (benchmark): (A) firms’ heterogeneity; (B) market results. Notes:D2 = difference between maximum and minimum producing firm; for the precise definition see Section 3. Details of com-putations and parameter values for the GA are in the notes to Table 2 (Fresh score). The innovation rate is p = 1 − (1 − pm)L,where pm = mutation rate ranges from 0.005 to 0.450. Initial strategy values are independent draws from a uniform dis-tribution.

S.D.2GA = 3.68). We conclude that it is not the level of noise per se that is the driving force behindResult 1. High levels of noise actually fail to generate firm heterogeneity. Instead, we claim thatthe driving force is the interplay between random search and limited cognitive abilities that takesplace in the decision process.

Further support for this conclusion comes from a sensitivity analysis on the innovation rate.The impact of variations ranging in p = 0.02–0.99 (pm = 0.005–0.45) on the variability indexesD1, D2, S.D.2, S.D.3, X, S.D.(X) is illustrated in Fig. 1. One immediately notices two points. First,for all innovation rates GA firms are no less individually heterogeneous than zero-intelligenceagents. Second, as the innovation rate grows, individual heterogeneity declines toward the levelof zero-intelligence agents, and, at the same time, variability over time steadily grows. Theseconsiderations lead us to state Result 2.

Simulation result 2 (Heterogeneity and randomness). When firms are initialized at random, thehigh level of innovation of the firms is not responsible for the individual heterogeneity result.


Fig. 2. Equilibrium initialization and Fresh score: (A) firms’ heterogeneity; (B) market results. Notes: D2 = differencebetween maximum and minimum producing firm; for the precise definition see Section 3. Details of computations andparameter values for the GA are in the notes to Table 2 (Fresh score). The innovation rate is p = 1 − (1 − pm)L, wherepm = mutation rate ranges from 0.005 to 0.450. Initial strategy values are at the Nash equilibrium level of 16.

The data present another major pattern. As the innovation rate approaches zero, firms’ hetero-geneity does not disappear; on the contrary, it is at its highest peak (see p = 0.01, in Table 2(3)).It seems that in the absence of innovation, the individual differences embedded with the initialrandomization are “frozen” in time.7 The toughest test for the persistence of firm heterogene-ity is to initialize the agents at equilibrium. When all potential strategies of all firms are set atthe equilibrium level and there is no innovation (p = 0), the system is locked at the equilibriumpoint with no individual heterogeneity. Fig. 2 illustrates what happens with increasing innovationrates. The comparison with the random initialization results in Fig. 1 reveals three patterns. First,market results are only modestly different. Differences in total production are always less than 1percent, mostly much less. Second, for medium and high innovation rates (p > 0.25, pm > 0.035)

7 Longer simulations only marginally change the results from Table 2. Some indexes after 20,000 iterations areD2 = 11.86, S.D.2 = 4.61, market production 129.03, and its standard deviation 1.39. To grasp the length of this exer-cise, consider that by an iteration a day it would take two years and 3 months to complete it. If one iteration (includingdecision and feedback from all eight firms) lasts one minute, it will take over two months working 8 h each day, 5 daysper week to do 20,000 iterations.


firm heterogeneity is very similar to the simulations with random initialization. Third, for lowinnovation rates (p < 0.25), equilibrium initialization generates considerably less firm heterogene-ity. One concludes that starting at equilibrium clearly matters for firm heterogeneity and in theexpected directions. It is interesting, though, that as the innovation rate grows the curve of indi-vidual heterogeneity D2 follows a bell shape with a maximum around p = 0.20–0.30.8 Hence, thecombination of a medium rate of innovation with limited cognitive abilities can disrupt the equilib-rium initialization and still generate levels of firm heterogeneity well above the zero-intelligenceagents. Such results suggest that there may be a given, high level of firm heterogeneity to whichthe system converges.

Another important conclusion from this comparison is that GA firms can coordinate muchmore easily on the market than on the individual equilibrium outcome. Irrespective of the type ofinitialization, when noise is moderate, below p = 0.40, aggregate outcomes are within 5 percentof the Nash equilibrium. However, this aggregate value can hide widely different patterns ofindividual choices, both close and far away from the individual equilibrium outcome. As the nextsections will explain, this result is even stronger when the GA firms are more, but still not fullyrational.

4. Trembling hand and evaluation of new strategies

This section and the next look at what aspects of bounded rationality are responsible for Result1 by exploring two dimensions of the rationality of GA agents: the process of evaluation of newpotential strategies and working memory constraints. We begin with a sensitivity analysis on theformer dimension.

A GA firm is characterized not only by its level of innovation but also by the filters that existbetween the creation of a new strategy and the decision to choose it for play. Thus, far a firmassigns a potential score to new strategies based on what would have been the outcome in the lastperiod to assess its potential performance this period (fresh score design).

We consider a weaker and a stronger filter. A trembling hand firm does not filter new strate-gies before putting them into practice. The firm does not realize that a new strategy is differentfrom the old (parent) strategy until the following periods. 9 Instead, a firm endowed with theelection operator compares the performance of old and new strategies to avoid discarding oldstrategies that are better than the new one. The new strategy replaces the old (parent) strat-egy in the memory set only if its potential score improves its parent’s score. This operator hasbecome more and more common in social science applications (Arifovic, 1994; Bullard and Duffy,1998).10

The results of the simulations under the trembling hand and election operator design areshown in columns (5) and (6), respectively, of Table 2. Not surprisingly, the trembling handagents are more noisy at the aggregate and overall individual level than the Fresh score agents(S.D.(X)TH = 14.84 versus S.D.(X)FS = 9.89 and D1TH = 18.80 versus D1FS = 15.06). Tremblinghand individuals are also slightly less heterogeneous (D2TH = 10.47 versus D2FS = 11.08), due

8 In longer simulations at 5000 iterations, the specific values of the initialization matters less than at 400 iterations.In particular, firm heterogeneity is very similar for p > 0.11, pm = 0.015. Instead, the position and level of the maximumheterogeneity (p = 0.2–0.3) is unchanged.

9 In a social learning environment, there is no difference between the trembling hand and Fresh score design, becausethe effect of the Fresh score works through the choice rule.10 The version implemented is weaker than that of Arifovic (1994) and similar to Franke (1997).


Fig. 3. Election operator: (A) firms’ heterogeneity; (B) market results. Notes: D2 = difference between maximum andminimum producing firm; for the precise definition see Section 3. Details of computations and parameter values for theGA are in the notes to Table 2 (Election). The innovation rate is p = 1 − (1 − pm)L, where pm = mutation rate ranges from0.005 to 0.450; random initialization.

to a similar effect of the higher amount of noise in the Fresh score design, as explained in theprevious section. Although it produces aggregate results closer to the Nash equilibrium outcomeand with a dramatically reduced variance (S.D.(X)EL = 0.11 versus S.D.(X)FS = 9.89), the electionoperator, surprisingly, does not decrease the amount of individual heterogeneity compared to theFresh score design (D2EL = 11.91 versus D2FS = 11.08). The surprise comes from the general viewthat the election operator characterizes agents with a higher level of rationality.

The election operator has a dramatic impact on the behavior of GA agents but does not lowerindividual heterogeneity. At the aggregate level, the result is similar to the work of Arifovic (1994)for the cobweb model. Without the election operator, there is a higher variability in the market’sproduction (see Fig. 1B), which almost completely disappears with the election operator (Fig. 3B).Some other results are counterintuitive, as one would conjecture that a higher level of rationality,as the election operator is generally intended to induce, would lead to behavior that is closer to thesymmetric Nash equilibrium at the individual level. To investigate the functioning of the electionoperator better, simulations were run varying the innovation rate, as has been done with the Fresh


score GA design (Fig. 1). The effect of the election operator on individual heterogeneity is notthe same for all innovation rates. In comparison with the Fresh score design, a higher individualheterogeneity, according to both D2 and S.D.2 indexes, is detected for innovation rates betweenp = 0.11 (pm = 0.015) and p = 0.44 (pm = 0.070). For innovation rates above p = 0.44, individualheterogeneity quickly declines below the level of the zero-intelligence agent (p = 0.57, pm = 0.1)and then toward zero; moreover, both individual heterogeneity and aggregate outcomes are closerto Nash equilibrium. Beyond p = 0.81 (pm = 0.185), the election operator seems to lose control ofthe inflow of new strategies from the high innovation rate, and the variance of aggregate outcomehas a spike (Result 3).

Simulation result 3 (Election operator). When the firms are endowed with an election operator,the individual heterogeneity level is lower than the level in the basic GA agents’ simulation onlywhen the level of innovation is higher than 44 percent.

The innovation process is the counterbalance to the tendency to reinforce good strategiesover time. In the Fresh score design, if the rate of innovation is too high there is a danger ofcorrupting the hard-learned good strategies. With the election operator there is no such danger:when a new strategy does not promise to be better than its parent, it does not get a chance tobe played. As a matter of fact, it is immediately forgotten. In this context, the innovation rateneeds a different interpretation than in the Fresh score design. One might think of it as an indexof computational speed, of how many strategies the agent can create, evaluate, and compare inone period. The election operator with a high innovation rate induces a superior ability to explorecurrently unavailable options.

The role play by the dynamic of evaluation of new strategies relates with the level of rationalityof the GA firm, but it cannot be disentangled from the general design of the algorithm, not evenfrom the experimentation rate. That limits somewhat the interpretation of the results.

In a social learning GA all new strategies are automatically played, and the election operatoris the only way to filter out disruptive behavior. In an individual learning GA, instead, there isan additional filter between innovation and play, which is the choice rule. A new strategy with alow potential score has a low chance of being selected for play (Proposition 1) but will be kept inmemory for future periods.

5. Memory constraints

Stronger memory capabilities make for a smarter decision maker. An agent with a largermemory size K has a longer historical memory and abandons an available strategy only after alonger sequence of trials (Back, 1996). Moreover, the decision maker has some advantages in theability to choose a better strategy (Corollary 1).

Corollary 1. In the memory set (i) the median ranking available strategy is chosen with proba-bility 1/K, (ii) the odds that the best versus the worst available strategy is chosen are increasingin the memory set size (2K − 1) (inverse of error odds), and (iii) the probability that the chosenstrategy ranks above the median ranking available strategy is 3/4, irrespective of the size of thememory set (K is assumed to be even).

While keeping the innovation level constant (p = 0.15), we can study the effect of differentmemory sizes, letting K range from 2 to 100 (Fig. 4). While augmenting noise (p) fades individualheterogeneity by generating overall variability (S.D.(X)), relaxing memory constraints makes fora better decision-maker with both lower individual heterogeneity (D2, S.D.2) and lower variance


Fig. 4. Changing memory constraints: (A) firms’ heterogeneity; (B) market results. Notes: D2 = difference betweenmaximum and minimum producing firm; for the precise definition see Section 3. Details of computations and parametervalues for the GA are in the notes to Table 2 (Fresh score). The memory set varies in size, K, from 2 to 100 at steps of 2;random initialization.

over time of individual actions (S.D.3). Numerical values for K = 2 and K = 90 can be found incolumns (7) and (8) of Table 2. This result is also in line with the findings of the psychologicalliterature: “differences in working memory capacity predict performance on a variety of tasks”(Daily et al., 2001, p. 315).

A larger memory set systematically reduces individual heterogeneity. Initially the reductionin individual heterogeneity is fast, then it slows without stopping its decline. For K ≥ 50 theGA agents are always less individually heterogeneous than zero-intelligence agents. Moreover,GA agents with a memory set as large as K = 100 almost halve their individual heterogeneity(D2 = 2.38 and S.D.2 = 0.80).11 In conclusion, memory size matters (Result 4).

Simulation result 4 (Memory constraints). When the rationality level of the firms is enhancedby enlarging memory capabilities, the individual heterogeneity level decreases toward zero.

11 Consider that there are 256 possible strategies in the binary coding.


6. Conclusions

We use simulations to study the impact of bounded rationality on the convergence of identicalfirms toward equilibrium in a Cournot oligopoly. A major result is that the interaction of firms withlimited information processing capabilities and limited working memory generates outcomes thatare close to the market equilibrium while being relatively far from the individual Nash equilibriumlevel. A sensitivity analysis confirms that firms exhibit a persistent heterogeneity in individualbehavior for a wide set of parameter values (Result 1). This result reproduces a qualitative featurefound in the experimental economic literature. Under identical incentives subjects behave in adiverse fashion (Bossaert and Plott, 2000; Ostrom et al., 1994; Ledyard, 1995).

The model employed in the simulations is ordinal in the payoff function and has neither firm-specific goals nor skills. We used an individual learning genetic algorithm, where, over time, thebest-performing strategies gain a higher probability of being played. New strategies are randomlygenerated and introduced into the set of available strategies. Even though one might suspect thatthe outcome originates from the stochastic nature of some genetic algorithm operators (hence itis built-in by construction), we show that this is not the case (Result 2).

Moreover, we identify conditions under which firm heterogeneity becomes minimal. Pointingat those conditions sheds light on the forces that generated firm heterogeneity in the first place.First, as the level of agent rationality increases, individual heterogeneity fades away, yielding thecanonical prediction that fully rational agents have uniform behavior. In particular, the introductionof a pre-evaluation of new potential strategies (election operator) has the effect of lowering firms’heterogeneity (Result 3). A similar effect comes from relaxing memory constraints (Result 4).Raising memory skills and firm rationality in general generates an immediately tighter fit ofaggregate outcome to the market equilibrium while it produces a slow but steady decrement infirm heterogeneity. These comparative static results suggest that the discovery of the aggregateNash equilibrium is easier than that of the individual Nash equilibrium. Moreover, they make clearthat calibrating a bounded rationality model only on its aggregate convergence to equilibrium maynot be enough. The model performance in terms of individual convergence to equilibrium providesa richer and more challenging set of parameter restrictions.

Second, an alternative way to reduce firm heterogeneity is to lower the rate of random search.The simple adoption of a low experimentation rate, however, is not a sufficient condition toattain low firm heterogeneity (Fig. 1). That goal can be achieved only if, in addition, firms areinitialized at the equilibrium strategy. Equilibrium initialization keeps firm homogeneity as longas this situation is “frozen” through zero or a very low experimentation rate. Irrespectively of theirimpact on heterogeneity, these model changes have little effect on aggregate results. Again, thisfinding points out that coordinating on the individual Nash equilibrium requires a level of skillsconsiderably higher than discovering the market equilibrium.

This work is not a statement that any form of bounded rationality will lead to individualheterogeneity in behavior. In fact, in the context that we have analyzed, only heavy boundsto rationality have produced it. The open issue is then how to calibrate these models to theactual cognitive limitations of people in order to understand if and how much of the individualheterogeneity observed in experimental data is due to bounded rationality.

In conclusion, we claim that a significant source of individual heterogeneity in human behaviorcould be due to the limited ability of agents to process information. This explanation has theadvantage of not building individual-specific variations into a model in order to explain theempirical individual diversity. The simulations presented suggest on one side the existence ofan inverse correlation between levels of rationality and levels of individual heterogeneity and on


the other side that the Nash equilibrium is a more robust predictor of aggregate behavior than ofindividual behavior.

Acknowledgments

A previous version of the paper circulated under the title “Can bounded rationality explainexperimental anomalies? A study with genetic algorithms.” Earlier versions of the paper benefitedfrom discussions with Ben Klemens, Charles Plott, Guillaume Frechette, Jasmina Arifovic, JohnKagel, Nelson Mark, Sean Gailmard, Simon Wilkie, and the comments of two anonymous referees.Thanks also goes to the comments from participants at the 2005 International Conference onCognitive Economics, New Bulgarian University, Sofia, Bulgaria, Ohio State University, theFifth Workshop in Experimental Economics in Siena, Italy, the ESA meeting in San Diego, CAand the FUE Summer School in Bounded Rationality in San Sebastian, Spain. Sharyn SlavinMiller, Maria Satterwhite, and Eloisa Imel from Caltech provided technical support. Financialsupport from the Division of the Humanities and Social Sciences at Caltech and a EU Marie CurieFellowship is gratefully acknowledged.

Appendix A. Supplementary material

Supplementary material associated with this article can be found, in the online version, atdoi:10.1016/j.jebo.2006.01.003.

References

Andrews, M., Prager, R., 1994. Genetic programming for the acquisition of double auction market strategies. In: Kinnear,K.E. (Ed.), Advances in Genetic Programming, vol. 1. MIT Press, Cambridge, pp. 355–368.

Arifovic, J., 1994. Genetic algorithm learning and the cobweb model. Journal of Economic Dynamics and Control 18,3–28.

Arifovic, J., 1996. The behavior of the exchange rate in the genetic algorithm and experimental economies. Journal ofPolitical Economy 1104, 510–541.

Back, T., 1996. Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, GeneticAlgorithms. Oxford University Press, New York.

Bossaert, P., Plott, C.R., 2000. Basic Principles of Asset Pricing Theory: Evidence from Large-scale Experimental FinancialMarkets, HSS WP 1070. California Institute of Technology.

Bullard, J., Duffy, J., 1998. A model of learning and emulation with artificial adaptive agents. Journal of EconomicDynamics and Control 22, 179–207.

Casari, M., Plott, C.R., 2003. Decentralized management of a common property resource: experiments with centuries-oldinstitutions. Journal of Economic Behavior and Organization 51, 217–247.

Chen, S.-H., Yeh, C.-H., 1998. Genetic programming in the overlapping generation model: an illustration with dynamicsof the inflation rate. In: Porto, V.W., et al. (Eds.), Evolutionary Programming. VII. Lecture Notes in Computer Science,vol. 1447. Springer, Berlin, pp. 829–838.

Chen, S.-H., Yeh, C.-H., 2001. Evolving traders and the business school with genetic programming: a new architectureof the agent-based artificial stock market. Journal of Economic Dynamics and Control 25, 363–393.

Daily, L.Z., Lovett, M.C., Reder, L.M., 2001. Modeling individual differences in working memory performance: a sourceactivation account. Cognitive Science 25, 315–353.

Dawid, H., 1996. Adaptive Learning by Genetic Algorithms, Analytical Results and Applications to Economic Models.Springer, Berlin.

Dawid, H., 1999. On the convergence of genetic learning in a double auction market. Journal of Economic Dynamics andControl 23, 1545–1569.

Franke, R., 1997. Behavioural heterogeneity and genetic algorithm learning in the cobweb model. IKSF (Institut furKonjunktur- und Strukturforschung), University of Bremen, no. 9.

http://dx.doi.org/10.1016/j.jebo.2006.01.003


Georges, C., 2006. Learning with misspecification in an artificial currency market. Journal of Economic Behavior andOrganization 60, 70–84.

Gode, D.K., Sunder, S., 1993. Allocative efficiency of markets with zero-intelligence traders: market as partial substitutefor individual rationality. Journal of Political Economy 1 (101), 119–137.

Goldberg, D., 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, New York.Holland, J.H., 1975. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor.Holland, J.H., Miller, J.H., 1991. Artificial adaptive agents in economic theory. American Economic Review, Papers and

Proceedings 81, 365–370.Ledyard, J.O., 1995. Public goods: a survey of experimental research. In: Kagel, J.H., Roth, A.E. (Eds.), The Handbook

of Experimental Economics. Princeton University Press, Princeton, pp. 111–197.Miller, G.A., 1956. The magical number seven, plus or minus two: some limits on our capacity for processing information.

Psychological Review 101 (2), 343–352.Miller, J.H., 1996. The coevolution of automata in the repeated prisoner’s dilemma. Journal of Economic Behavior and

Organizations 29, 87–112.Mitchell, M., 1996. An Introduction to Genetic Algorithms. MIT Press, Cambridge.Nowak, M.A., Sigmund, K., 1998. Evolution of indirect reciprocity by image scoring. Nature 393, 573–576.Ostrom, E., Gardner, R., Walker, J., 1994. Rules, Games and Common-pool Resources. University of Michigan Press,

Ann Arbor.Palfrey, T.R., Prisbrey, J.E., 1997. Anomalous behavior in public goods experiments: how much and why? American

Economic Review 87, 829–846.Saijo, T., Nakamura, N., 1995. The ‘spite’ dilemma in voluntary contribution mechanism experiments. Journal of Conflict

Resolution 39, 535–560.Vriend, N.J., 2000. An illustration of the essential difference between individual and social learning, and its consequences

for computational analysis. Journal of Economic Dynamics and Control 24, 1–19.

markets in equilibrium with firms out of equilibrium: a simulation study

Documents