bees? “bees?” a large scale, cooperative simulation ...meeden/cs81/s14/papers/alexraven… ·...

9
Bees? Simms, Thielstrom Adaptive Robotics Spring 2014 “Bees?” A Large Scale, Cooperative Simulation Weighing Altruism and Selfishness Alexander Simms,Ravenna T hielstrom Swarthmore College Abstract In nature, communities that revolve around altruistic cooperation exist: for example, a bee colony, where individual bees retrieve nectar for the communal hive rather than scouting selfishly for their own survival. Although cooperation can certainly be implemented into an artificial life simulation, from an evolutionary standpoint a more interesting question is raised: can the interactions of a group and the parameters of their environment produce emergent cooperation among independent agents? This paper aims to investigate the possibility of emergent altruism through evolutionary game theory. To this effect, NEAT was used to evolve neural-net topologies across many generations in a bee colony simulation, testing what possible circumstances might lead to altruistic decisions or selfish decisions in the hive. A number of experiments were conducted, incorporating such concepts as a measure of group fitness, recurrent networks, and inter-agent communication. Initial speculation produced a hypothesis that the amount of altruism and selfishness demonstrated in NEAT-trained agents will be most affected by an individual fitness relying on group fitness. The results of the experiments demonstrate that while this is partly true and group fitness is key to altering artificial behavior, selfishness generally continues to prevail, with cooperation appearing rarely, or only when coerced. Through group-affected fitness, agents can learn to use altruism strategically to mitigate the negative effects of selfishness. I. Introduction C ooperative behavior has frequently been discussed within the contexts of artificial intelligence. The prisoner’s dilemma often serves as a basis for these discussions due to its relevance to issues and challenges of social order. Michael W. Macy’s paper on “Social Order in Artificial Worlds” describes the prisoner’s dilemma as the “paradox of social order”. [3] The prisoner’s dilemma is a hypothetical situation that tests human nature. The basic dilemma is this: Two criminals are imprisoned, awaiting trial. The chief prosecutor has enough evidence to convict both of them of a lesser charge, but not quite enough evidence to convict them of their full crimes. The prosecutor then hatches a plan: both prisoners are asked to betray the other by testifying against him. If one prisoner should choose to defect and betray the other, he will be set free and the betrayed prisoner will receive 3 years in prison. If, on the other hand, they both simultaneously betray each other, both will receive a 2 year sentence. Cooperative silence from both parties gets them both the lesser sentence of 1 year. Although it may be argued that altruism is the best option for both parties, according to most game-theoretic explanations of this scenario, one should always choose to betray in order to minimize personal loss, and indeed one would expect this kind of decision from an artificial agent. Macy specifically names four possible outcomes in the prisoner’s dilemma, ordered by the amount of payoff: temptation, defecting while the other prisoner remains silent; reward, both prisoners staying silent; punishment, both prisoners defect; and “sucker,” staying silent while the other prisoner betrays you. The reason why this game can be called a paradox is that the choice leading 1

Upload: ngokhanh

Post on 06-Mar-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

Bees? • Simms, Thielstrom • Adaptive Robotics • Spring 2014

“Bees?”A Large Scale, Cooperative Simulation

Weighing Altruism and Selfishness

Alexander Simms, Ravenna Thielstrom

Swarthmore College

Abstract

In nature, communities that revolve around altruistic cooperation exist: for example, a bee colony, whereindividual bees retrieve nectar for the communal hive rather than scouting selfishly for their own survival.Although cooperation can certainly be implemented into an artificial life simulation, from an evolutionarystandpoint a more interesting question is raised: can the interactions of a group and the parameters of theirenvironment produce emergent cooperation among independent agents? This paper aims to investigatethe possibility of emergent altruism through evolutionary game theory. To this effect, NEAT was used toevolve neural-net topologies across many generations in a bee colony simulation, testing what possiblecircumstances might lead to altruistic decisions or selfish decisions in the hive. A number of experimentswere conducted, incorporating such concepts as a measure of group fitness, recurrent networks, andinter-agent communication. Initial speculation produced a hypothesis that the amount of altruism andselfishness demonstrated in NEAT-trained agents will be most affected by an individual fitness relying ongroup fitness. The results of the experiments demonstrate that while this is partly true and group fitness iskey to altering artificial behavior, selfishness generally continues to prevail, with cooperation appearingrarely, or only when coerced. Through group-affected fitness, agents can learn to use altruism strategicallyto mitigate the negative effects of selfishness.

I. Introduction

Cooperative behavior has frequently beendiscussed within the contexts of artificial

intelligence. The prisoner’s dilemma oftenserves as a basis for these discussions dueto its relevance to issues and challenges ofsocial order. Michael W. Macy’s paper on“Social Order in Artificial Worlds” describesthe prisoner’s dilemma as the “paradox ofsocial order”. [3] The prisoner’s dilemma is ahypothetical situation that tests human nature.The basic dilemma is this: Two criminalsare imprisoned, awaiting trial. The chiefprosecutor has enough evidence to convictboth of them of a lesser charge, but not quiteenough evidence to convict them of their fullcrimes. The prosecutor then hatches a plan:both prisoners are asked to betray the other bytestifying against him. If one prisoner shouldchoose to defect and betray the other, he will be

set free and the betrayed prisoner will receive 3years in prison. If, on the other hand, they bothsimultaneously betray each other, both willreceive a 2 year sentence. Cooperative silencefrom both parties gets them both the lessersentence of 1 year. Although it may be arguedthat altruism is the best option for both parties,according to most game-theoretic explanationsof this scenario, one should always choose tobetray in order to minimize personal loss, andindeed one would expect this kind of decisionfrom an artificial agent. Macy specificallynames four possible outcomes in the prisoner’sdilemma, ordered by the amount of payoff:temptation, defecting while the other prisonerremains silent; reward, both prisoners stayingsilent; punishment, both prisoners defect; and“sucker,” staying silent while the other prisonerbetrays you. The reason why this game canbe called a paradox is that the choice leading

1

Bees? • Simms, Thielstrom • Adaptive Robotics • Spring 2014

to the highest payoff (selfishness) also hasthe potential for the greatest failure. Theexperiments outlined in this paper take placenot on an individual-to-individual level butrather in a large communal colony: theydon’t explicitly use the prisoner’s dilemma.However, each agent has a similar choice ofselfishness or altruism with somewhat similaroutcomes, only on a larger communal scale. Itshould be noted that the experiments outlinedin this paper also differ from the prisoner’sdilemma in another significant way: theseexperiments focus not on how selfishness inone agent causes suffering in another, butrather how selfishness negatively impacts theentire community overall, including the selfishindividuals themselves.

As mentioned by Macy, the single-iterationprisoner’s dilemma is best played with aselfish dominant strategy, a conclusion thatis also borne out in the experiments of thispaper. However, this strategy is not the bestwhen the prisoner’s dilemma is repeated, asexpanded upon in David B. Fogel’s “EvolvingBehaviors in the Iterated Prisoner’s Dilemma”.[1] In the iterated prisoner’s dilemma, bothprisoners are put through this same proceduretime and time again, and have memory ofhow the iterations went in the past. Fogelexplicitly evolved best strategies over manydifferent simulations of the scenario. Playersof the recurrent prisoner’s dilemma willlearn over time which individuals are morelikely to defect or cooperate, and adjusttheir strategies accordingly when facing eachof those individuals. However, in a hiveenvironment where many independent agentsare making the decision simultaneously, this isnot a viable option to learn. Instead, for someof the experiments, recurrent networks areused to provide each generation of agents withseveral rounds of simulations and informationfrom each one in order to learn general trendsof the hive.

Finally, Sarit Kraus’ “Negotiation and coop-eration in multi-agent environments” discussesthe possibilities of achieving cooperation in ashared environment like the hive simulation set

up for these experiments. [2] One such strategydiscussed suggests that in situations whereresources must be shared, a communicationsystem may benefit self-motivated agents.Along a similar line of thinking, Section Vof this paper implements a system in whichagents have the opportunity of increasingoverall hive resources by communicatingwhere more resources can be found.

II. Experiments

I. The Bee Model

To conduct these experiments, a model wasdeveloped that takes inspiration from thenectar-collecting hive insect, the bee. Ourmodel consists of a population of individualneural nets, evolving by the NeuroEvolution ofAdapting Topologies algorithm, or NEAT. [4]NEAT, first developed by Kenneth O. Stanley,is a genetic algorithm that has a flexible geneticencoding that allows for the topology of thenetwork to grow and change over time, basedon a user-defined measure of “fitness”. Allof the genes within the genetic encoding aretagged with historical markings, which allowsfor efficient implementation of “crossing-over,”the process by which genetic information froman individual’s parents is mixed. Finally, NEATallows for speciation: individuals within thepopulation are grouped together based on thesimilarity of their genomes, and only thoseindividuals with similar genomes are allowedto reproduce with one another. This allowsfor each species to find its own niche, andprevents any one species from outcompetingall of the others. It is because of these featuresthat NEAT was selected for this task.

Each individual NEAT agent within thismodel is called a “bee,” and the populationof all of the NEAT agents is called the “hive”.Each “day” within this model, all of the beesleave the hive to go and find nectar. Thenectar is represented in this model as a randomfloating-point number between 0 and 1. Whenit receives this nectar, it has the choice to eitherbe selfish or altruistic. If the bee decides to be

2

Bees? • Simms, Thielstrom • Adaptive Robotics • Spring 2014

selfish, it eats the nectar then and there. If itdecides to be altruistic, it brings the nectar backto the hive, where all of the nectar brought backby altruistic bees is pooled and shared equallyamong all of the altruistic bees. The “fitness”in this system is simply how much nectar a beewas able to eat on a given day.

II. Basic Experiment

An experiment was carried out to determinethe workability of the model. In thisexperiment, each bee lived for only oneday, and afterwards died. Their fitnessesdetermined the makeup of the next generationof bees: the most fit bees were crossed toproduce hopefully more fit offspring. TheNEAT networks were evaluated using serialactivation on a sigmoid function, as they werein all of the experiments detailed in this paper.The initial topology had no hidden nodes. Itsonly input was the nectar it found, again, afloat between 0 and 1, and output representedthe choice it made. (See Figure 1.)

Figure 1: All that is input into the system is the amountof nectar that the bee has found on thatday, and the choice that the bee has made isdetermined by whether its one output is closerto 0 or to 1.

Figure 2: Under the conditions of the basic experiment,altruism lost handily to selfishness.

In all of the trials run, the majority of thebees wound up selfish. A representative trialis shown in Figure 2. There are a number ofreasons for this, the most pressing being thefact that the bees had no direct incentive to actaltruistically. While more stable fitnesses couldtheoretically be obtained through altruism, thesystem only works if all of the bees are actingaltruistically: enough bees will bring backabove-average amounts of nectar to offset thebees that bring back a below-average amountof nectar. However, if too few bees are actingaltruistically, then this buffering effect is notpresent.

III. Hive Fitness

How, then, can this selfishness be overcome?In the real world, actual hives collect the nectarthat the bees bring in, and convert it into honey.This is the inspiration for a measure of “hivefitness,” indicating how much nectar the hivehas on “reserve”. Each day, when the altruisticbees bring the nectar back, if nectar levels arebelow 10 units, a tenth of the nectar is takenfrom them and put into the hive, to supportthe “queen,” who eats 0.5 units of nectar perday. The amount of nectar in the reserve isturned into a modifier as described in Figure3, which is then multiplied by the amount ofnectar each bee finds to determine their fitness.

3

Bees? • Simms, Thielstrom • Adaptive Robotics • Spring 2014

if hive_nectar > 10:modifier = 1

elif hive_nectar > 0:modifier = hive_nectar / 10

else:modifier = .00001

Figure 3: The fitness of each bee is affected by the amountof nectar reserves in the hive. The modifierdefined by this section of code is multiplied bythe amount of nectar the bee collected in orderto determine that bee’s fitness. The last casehas a very small number and not just 0 due torestrictions in the NEAT algorithm.

Figure 4: The addition of the hive fitness component tobee fitness was unsuccessful in coercing thebees to behave more altruistically.

Figure 5: Although the bees’ fitnesses continued toshrink along with the hive fitness, they didnot behave altruistically.

In this experiment, the bees still behavedoverwhelmingly selfishly, for largely the samereasons as in the basic experiment. Althoughthe hive fitness mechanic was added, the beeshave no way of determining what the hivefitness currently is, and therefore have noway to adjust their actions to be altruistic orselfish, depending on the current state of thehive. Because of this, they continue to defaultto the same behavior of selfishness, becausethey are unaware of this new restriction ontheir behavior. Furthermore, because the samefitness modifier is applied to both the selfishbees and to the altruistic bees, the selfish beeshave a similar advantage: 0.0007 is still betterthan 0.0001.

Subsequent experiments were run asa proof of concept, in which the fitnessadjustment was only made to the selfishbees, and in which the selfish bees were allassigned a fitness value of 0. While theseproduced results in which the bees all woundup altruistic, further reporting on them isoutside of the scope of this paper, which isattempting to get altruism without specificallyencoding for it.

4

Bees? • Simms, Thielstrom • Adaptive Robotics • Spring 2014

IV. Recurrent Networks

In this experiment, the bee model is consid-erably more fleshed-out. The bees now livefor more than one day, and have considerablymore inputs than they did before. Eachindividual day proceeds as it did in the lastexperiment: the bees go out to collect nectar,bring it back to the hive or eat it while they’reout, and the nectar brought back to the hiveis shared between the bees that brought backnectar, minus a 10% to support the hive itself, ifnectar levels drop below 20. However, the beesnow have a way of determining how muchnectar is in the hive. This is a very importantinput, because their fitness rests on keepingthis value within appropriate ranges. Thisinput is normalized in the same manner thatfitness is in Figure 3.

Recurrent networks can be much betterat predicting the results that their behaviorwill have on their environment. Becauseof this, recurrent networks will used inthis experiment. To get the most out ofthis recurrence, the bees will also haveinputs corresponding to the choice they made“yesterday,” as well as how much fitness theygot. (See Figure 6) On the first day of the bee’slife, the recurrent inputs are set to 0.5 to avoidinfluences on the bee’s behavior.

Figure 6: In addition to the nectar found, the networksnow include the choice that the bee madeyesterday, as well as the amount of fitnessthey received, and the level of the hive nectar,normalized between 0 and 1.

Figure 7: The wavering pattern shown in the latergenerations cropped up very consistentlyacross experiments, and serves to maintainhive nectar at a particular level.

Figure 8: Under the recurrent network, the bees wereable to maintain relatively high hive fitness.

Under the recurrent system, consistentlyhigher hive fitnesses were maintained (seeFigure 8), despite the fact that the majorityof the bees were selfish (see Figure 7). Thisis because the bees are acting just altruisticenough to keep the hive alive, while actingselfishly for the rest of the time. This isfacilitated largely by the input reminding thebees of hive nectar reserves. The nectar levelsdipp down twice in the experiment shown inFigure 8, however: these two drops represent

5

Bees? • Simms, Thielstrom • Adaptive Robotics • Spring 2014

the networks learning the implications of thatinput on their fitness. After it is discovered,they maintain the hive fitness at a reasonablelevel.

Despite the generally positive hive fitnessmaintenance, overall it is clear that the beesfavor selfishness. In all trials, altruism neverrises substantially above the halfway point bythe end of the specified amount of generations,except for a single instance which occurredduring a series of runs with one starting hiddennode. This will be discussed later with similarresults from Section 2.5.

V. Advice from Bees

One major factor that has been missingfrom this simulation of social order so faris, of course, direct interaction between theindividuals. Each agent has control over theirown decision, which may indirectly affect thedestiny of the entire hive, but beyond thisthey have no opportunity to influence othermembers of their community, as does happenin real life. For example, another way that realbiological bees cooperate within the hive isby sharing information about sites where theyhave foraged, essentially notifying other beeswhere to get the most nectar from.

To implement this, a second output wasadded to the neural network: the decisionwhether or not to give another bee advice.(see Figure 9) This decision gets taken intoaccount by the next bee in the population. Ifadvice was given and the previous bee foundenough nectar for the advice to be beneficial,the current bee will take that advice andreceive the same amount of found nectar asthe previous bee. Otherwise, found nectar willbe randomized as usual. The general aim ofthis experiment was to observe how the beeswould choose to exploit this opportunity andwhat trends in altruism or selfishness mightresult from prosperous advice.

Figure 9: In addition to the first decision output, thesecond output represents a choice to sharehelpful information with other bees.

Figure 10: The advice system doesn’t seem to generallyimprove upon previous experiments, butsemi-positive instances certainly did occur.The familiar wavering was often present.

6

Bees? • Simms, Thielstrom • Adaptive Robotics • Spring 2014

Figure 11: The correlation between guidance andaltruism was generally found to be inverselyproportionate, although this wasn’t alwaysthe case.

Figure 12: Hive fitness also is seemingly inverselyproportionate to guidance.

Figure 13: A cooperative, altruistic society can beachieved very rarely.

Figure 14: For altruism to win out, it seems guidancemust be first abandoned, as can be seenaround the 500 day mark.

7

Bees? • Simms, Thielstrom • Adaptive Robotics • Spring 2014

Figure 15: As before, there seems to be a proportional re-lationship between guidance and selfishness.

Results from this experiment were in-conclusive. Increases in guidance withinthe bee population generally correspondedwith increases in selfishness, as seen inthe Figures 9 - 15 above. However, hivefitness was affected in two different ways:some runs showed increases in hive fitnessas guidance rose, and some runs showeddecreases (although the latter was generallymore prevalent). The rational behind this isthat as guidance rises, the overall amount offound nectar rises, so hive fitness is more likelyto rise if there are an acceptable amount ofaltruistic bees. On the other hand, bees thatare consistently receiving an above averageamount of nectar through guidance are alsomore likely to behave selfishly instead ofaltruistically, which obviously will decreasehive nectar. Overall, it seems that a lackof guidance actually corresponds to higheramounts of altruism, although to understandwhy this might be the case further researchwould be necessary. The system of one-waycommunication implemented here, then, seemsto achieve the opposite of the desired affect.

Yet, for about 5% of the trials, evolutionswere found that ended up with an altruisticmajority. Experiments under the exactsame conditions and parameters were alsorepeatedly run without the advice outputinfluencing the found nectar, and exactly one

majority altruistic instance was found in aseries of over twenty trials. Guidance, despiteno longer having any affect on any part of theexperiment, still showed the same patterns ofcorrelation with selfishness.

III. Discussion

As has been demonstrated, it is rather difficultto get independent agents to achieve altruismin this sort of environment. It has beentheorized that actual bees have some sortof urge hard-coded into their genetics to bealtruistic, and this is how hives are able tosurvive. [3]

One substantial difficulty for the devel-opment of altruism in this particular modelcomes from the way that the fitness functionis implemented. All of the bees coming backare coming back with nectar in the range of 0and 1, so, amortized, choosing to be altruisticis choosing a fitness of 0.5. Choosing to beselfish has the potential to obtain much higherfitnesses than that, and those individuals willbe able to reproduce, at the cost of the altruisticindividuals of the hive.

It is important to note that altruismis of course perfectly possible and easilyimplemented, as has been mentioned in SectionIII, along with other test runs in whicheach bee’s fitness was simply the overallhive fitness. If the programmer explicitlyrewards or punishes behaviors, or directly linksmotivation solely to the group rather than theindividual, good results can be achieved, butat the cost of the factor of self-motivation andinherent self-interest that is present in real lifecommunities.

One phenomenon of the experimentwhich occurred again and again was thewavering behavior of the agents’ decisions,demonstrated in Figures 7, 10, and 13. Thelarger the wavering is, the more likely it is tocorrelate to a stabilized amount of hive nectar,usually at 0, but with smaller oscillationsthe wavering can signify a higher stabilizedlevel depending on the conditions of the hivefitness modifications. Wavering is caused by

8

Bees? • Simms, Thielstrom • Adaptive Robotics • Spring 2014

the population rapidly varying the amountof altruistic/selfish decisions across days andgenerations. Since this generally occurs aroundone of the boundaries for punishment, itis logical to conclude that the bees, whilenot behaving truly cooperatively, can stillevolve somewhat of a system for cooperationwhere altruism is practiced only in necessarymoments to avoid hive punishment, andthen reverting to selfishness once the dangerhas temporarily passed. This is obviouslynot an ideal community, especially since theoscillating tends to happen at 0, but the balanceof selfish cooperation they have managed toachieve does grimly mimic some aspects of ourown society.

Finally, the unexpected, rare appearanceof unmistakable altruism during the advicesystem experiments is a result that cannot re-ally be suitably explained with the knowledgeavailable. The few instances where altruismappeared, as mentioned earlier, occurred onlyabout 5% of the time in the advice systemruns, and the only clear pattern between themseemed to be that guidance dropped down tozero or close to zero just as altruism began towin out over selfishness. Why this happensis unclear since there doesn’t seem to beany benefit to not giving advice, other thanavoiding the discussed trend of selfishnessresulting from the higher amount of nectarfound. The sudden increase in randomnesscould have had something to do with thisbehavior, as opposed to feeding the networkgrowingly similar above-average values ofnectar.

However, the fact that this pattern occurseven when guidance is made irrelevant seems

to imply that it is not the amount of nectar itselfwhich prompts a higher chance of altruisticsolution, but rather the presence of a secondoutput which, since it is recursively passedonto the next day as an input, may serve asa kind of extra hidden node. This theory issupported by the single instance of altruismfound in the recurrent network experiments,where 1 hidden node was added to the start ofeach experimental run. More experimentationwould be needed to fully investigate thecontext of these rare instances.

Although results are not as clear oras optimistic as initially hoped for, theexperiments conducted have shown thatindividual evolution of altruistic communitiesis indeed possible, if rare. Further researchwould be necessary to better understand howindependent, self-motivated cooperation isformed in these instances.

References

[1] David B. Fogel. Evolving behaviors in theiterated prisoner’s dilemma. EvolutionaryComputation, 1(1):77–97, 1993.

[2] Sarit Kraus. Negotiation and cooperationin multi-agent environments. ArtificialIntelligence, 94:79–97, 1997.

[3] Michael W. Macy. Social order in artificialwords. Journal of Artificial Societies and SocialSimulation, 1(1), 1998.

[4] Kenneth Stanley. Competitive coevolutionthrough evolutionary complexification.Journal of Artificial Intelligence Research, 21,2004.

9