ieee transactions on computational …simingl/dissertation/evolving effective...macro and micro...

13
IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, SPECIAL ISSUE ON REAL-TIME STRATEGY GAMES, JUNE 2016 1 Evolving Effective Micro Behaviors in Real-Time Strategy Games Siming Liu, Student Member, IEEE, Sushil J. Louis, Member, IEEE, Christopher Ballinger, Student Member, IEEE, Abstract—We investigate heuristic search algorithms to gen- erate high quality micro management in combat scenarios for real-time strategy games. Macro and micro management are two key aspects of real-time strategy games. While good macro helps a player collect more resources and build more units, good micro helps a player win skirmishes and battles against equal numbers and types of opponent units or win even when outnumbered. In this paper, we use influence maps and potential fields as a basis representation to evolve short term positioning and movement tactics. Unit micro behaviors in combat are compactly encoded into fourteen parameters. A genetic algorithm evolves good micro behaviors by manipulating these fourteen parameters. We compared the performance of our evolved ECSLBot with two other state of the art bots, UAlbertaBot and Nova on several skirmish scenarios in a popular RTS game StarCraft. The results show that the ECSLBot tuned by genetic algorithms outperforms UAlbertaBot and Nova in kiting efficiency, target selection, and fleeing. Further experiments show that the parameter values evolved in one scenario work well in other scenarios and that we can switch between pre-evolved parameter sets to perform well in unseen scenarios containing more than one type of opponent unit. We believe our representation and approach applied to each unit type of interest, can result in effective micro performance against melee and ranged opponents and provides a viable approach towards complete RTS bots. Index Terms—RTS game, genetic algorithm, micro, influence map, potential field. I. I NTRODUCTION R EAL-TIME STRATEGY (RTS) games have become a popular platform for computational and artificial intelli- gence (CI and AI) research in recent years. RTS players need to gather resources, build structures, train military units, re- search technologies, conduct simulated warfare, and hopefully, defeat their opponent. All of these factors and their impact on in-game, time-critical decisions are essential for a player to win a RTS game. In RTS communities, RTS players usually divide their decision making into two separate levels of tasks called macro and micro management, as shown in Figure 1. Macro is long term planning, like generating good build orders in the early game, technology upgrading, and scouting. Good macro management helps a player build a larger army or economy or both. On the other hand, micro is the ability to control a group of units in combat or other skirmish scenarios to minimize unit loss and maximize damage to opponents. We decompose micro management into two parts: tactical and reactive control [1]. Tactical control addresses the overall S. Liu, S. Louis, and C. Ballinger are with the Department of Com- puter Science and Engineering, University of Nevada, Reno, Reno NV, 89557-0148 USA (e-mail: [email protected]; [email protected]; ca- [email protected]). positioning and movement of a squad of units. Reactive control focuses on controlling a specific unit to move, fire, and flee in combat. This paper investigates heuristic search algorithms to find winning tactical and reactive control for skirmish scenarios as indicated by the dotted square in Figure 1. Micro management of units in combat aims to maximize damage given to enemy units and minimize damage to friendly units. Common micro techniques in combat include concentrating fire on one target, withdrawing seriously damaged units from the front of the battle, and kiting 1 your units to take advantage of the enemy units’ attack-distance limitation. We are inter- ested in generating competitive micro as part of an RTS game player that outperforms an opponent with the same or greater number of enemy units. In the future, we plan to incorporate these results into the design of more complete RTS game players. Fig. 1: Typical RTS AI levels of abstraction. Inspired by a figure from [2]. Spatial maneuvering is an important component of combat in RTS games. We applied a commonly used technique called Influence Maps (IMs) to represent terrain and enemy spatial information. IMs have been widely used to attack spatial problems in video games, robotics, and other fields. An IM is a grid placed over a virtual world with values assigned to each square by an IM function described in [3]. Figure 2 shows an IM which represents a force of enemy units and the surrounding terrain in StarCraft: Brood War, our simulation environment. A unit IM is computed by all enemy units in the game. In our experiments, greater cell values indicate more enemy units in the cell and more danger to friendly units. In addition to the position of enemy units, terrain is 1 Kiting refers to making your units pull back and shoot repeatedly in order to keep your units at an optimum distance from your target.

Upload: doankiet

Post on 11-Jun-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON COMPUTATIONAL …simingl/Dissertation/Evolving Effective...Macro and micro management are two ... evolved in one scenario work well in other scenarios and that

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, SPECIAL ISSUE ON REAL-TIME STRATEGY GAMES, JUNE 2016 1

Evolving Effective Micro Behaviorsin Real-Time Strategy Games

Siming Liu, Student Member, IEEE, Sushil J. Louis, Member, IEEE, Christopher Ballinger, Student Member, IEEE,

Abstract—We investigate heuristic search algorithms to gen-erate high quality micro management in combat scenarios forreal-time strategy games. Macro and micro management are twokey aspects of real-time strategy games. While good macro helpsa player collect more resources and build more units, good microhelps a player win skirmishes and battles against equal numbersand types of opponent units or win even when outnumbered.In this paper, we use influence maps and potential fields asa basis representation to evolve short term positioning andmovement tactics. Unit micro behaviors in combat are compactlyencoded into fourteen parameters. A genetic algorithm evolvesgood micro behaviors by manipulating these fourteen parameters.We compared the performance of our evolved ECSLBot withtwo other state of the art bots, UAlbertaBot and Nova on severalskirmish scenarios in a popular RTS game StarCraft. The resultsshow that the ECSLBot tuned by genetic algorithms outperformsUAlbertaBot and Nova in kiting efficiency, target selection, andfleeing. Further experiments show that the parameter valuesevolved in one scenario work well in other scenarios and that wecan switch between pre-evolved parameter sets to perform well inunseen scenarios containing more than one type of opponent unit.We believe our representation and approach applied to each unittype of interest, can result in effective micro performance againstmelee and ranged opponents and provides a viable approachtowards complete RTS bots.

Index Terms—RTS game, genetic algorithm, micro, influencemap, potential field.

I. INTRODUCTION

REAL-TIME STRATEGY (RTS) games have become apopular platform for computational and artificial intelli-

gence (CI and AI) research in recent years. RTS players needto gather resources, build structures, train military units, re-search technologies, conduct simulated warfare, and hopefully,defeat their opponent. All of these factors and their impact onin-game, time-critical decisions are essential for a player towin a RTS game. In RTS communities, RTS players usuallydivide their decision making into two separate levels of taskscalled macro and micro management, as shown in Figure 1.Macro is long term planning, like generating good build ordersin the early game, technology upgrading, and scouting. Goodmacro management helps a player build a larger army oreconomy or both. On the other hand, micro is the ability tocontrol a group of units in combat or other skirmish scenariosto minimize unit loss and maximize damage to opponents.We decompose micro management into two parts: tacticaland reactive control [1]. Tactical control addresses the overall

S. Liu, S. Louis, and C. Ballinger are with the Department of Com-puter Science and Engineering, University of Nevada, Reno, Reno NV,89557-0148 USA (e-mail: [email protected]; [email protected]; [email protected]).

positioning and movement of a squad of units. Reactive controlfocuses on controlling a specific unit to move, fire, and fleein combat. This paper investigates heuristic search algorithmsto find winning tactical and reactive control for skirmishscenarios as indicated by the dotted square in Figure 1. Micromanagement of units in combat aims to maximize damagegiven to enemy units and minimize damage to friendly units.Common micro techniques in combat include concentratingfire on one target, withdrawing seriously damaged units fromthe front of the battle, and kiting1 your units to take advantageof the enemy units’ attack-distance limitation. We are inter-ested in generating competitive micro as part of an RTS gameplayer that outperforms an opponent with the same or greaternumber of enemy units. In the future, we plan to incorporatethese results into the design of more complete RTS gameplayers.

Fig. 1: Typical RTS AI levels of abstraction. Inspired by afigure from [2].

Spatial maneuvering is an important component of combatin RTS games. We applied a commonly used technique calledInfluence Maps (IMs) to represent terrain and enemy spatialinformation. IMs have been widely used to attack spatialproblems in video games, robotics, and other fields. An IMis a grid placed over a virtual world with values assignedto each square by an IM function described in [3]. Figure 2shows an IM which represents a force of enemy units and thesurrounding terrain in StarCraft: Brood War, our simulationenvironment. A unit IM is computed by all enemy units inthe game. In our experiments, greater cell values indicatemore enemy units in the cell and more danger to friendlyunits. In addition to the position of enemy units, terrain is

1Kiting refers to making your units pull back and shoot repeatedly in orderto keep your units at an optimum distance from your target.

Page 2: IEEE TRANSACTIONS ON COMPUTATIONAL …simingl/Dissertation/Evolving Effective...Macro and micro management are two ... evolved in one scenario work well in other scenarios and that

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, SPECIAL ISSUE ON REAL-TIME STRATEGY GAMES, JUNE 2016 2

another critical factor for micro behaviors. For example, kitingenemy units near a wall is not a wise move. We then useanother IM to represent terrain in the game world to assistmicro management. We combine the two IMs and use thisbattlefield (or map) spatial information to guide our AI player’spositioning and reactive control. In our research, we use searchalgorithms to find optimal IM parameters that help specifyhigh quality micro behaviors of units in combat scenarios.Once learned, the parameters that define an IM generalizewell to other game maps and is one reason IMs are a usefulrepresentation for spatial information.

Fig. 2: An IM representing the game world with enemy unitsand terrain. The light area on the bottom right representsenemy units. The light area surrounding the map representsa wall.

While good IMs tell us where to go, good unit navigationtells our units how best to move there. We use PotentialFields (PFs) in our research to control a group of unitsnavigating to particular locations on the map. PFs are usedwidely in robotics research for coordinating movement formultiple entities and are often applied in video games for thesame purpose. In our research, we apply two PFs to coordinateunits’ movement and use two parameters to specify each PF.

The long term goal of our research is to create a completehuman-level RTS game player and this paper tackles oneaspect of this problem: finding effective micro managementfor winning small combat scenarios. Our approach is to evolve(off-line) two sets of parameters for each unit type that wewant to control. One set of parameters specifies behavioragainst melee 2 units, the other set specifies behavior againstranged units. During a real-time skirmish, ECSLBot switchesbetween these two parameter sets, and their correspondingbehaviors, based on whether our ECSLBot controlled unit’scurrent target is a melee unit or a ranged unit. In this paper,we apply this approach to Vultures, a fast but fragile Terranunit.

Several challenges needed to be addressed in our research.First, how do we represent reactive control like kiting, targetselection, and fleeing? Second, since the parameters for IM,PF, and reactive control are related, how do we tune thesevariables on different types of units and scenarios? Third,

2Melee refers to close combat using short ranged weapons like swords orhammers.

how well does our micro bot perform in selected samplescenarios compared with other state of the art bots? Finally, dothe evolved behaviors generalize to scenarios not previouslyencountered?

To investigate these issues, we compactly represent microbehaviors as a combination of two IMs, two PFs, and aset of reactive control variables. In earlier work, we hadcompared the quality and robustness of movement behaviorsproduced by genetic algorithms (GAs) and two types of HillClimbers (HCs). The results showed that our HCs were quickbut unreliable, while GAs were slower but reliably foundhigh quality solutions[4]. Going with higher quality, in thispaper, we use GAs to evolve good combinations of IM, PF,and reactive control parameters that lead to winning microbehaviors. Since StarCraft has become a popular platformfor RTS AI research, we created our skirmish scenarios inStarCraft and used GAs to search for winning micro behaviorsin these scenarios. We subsequently compared the performanceof micro behaviors produced by our GAs with two state ofthe art StarCraft bots, UAlbertaBot [5] and Nova [6], on bothtraining and testing scenarios. We chose these bots becausetheir source code was available and they performed well inprior competitions. The results show that Nova performs wellon kiting behavior against melee units, while UAlbertaBotdoes well against ranged attack units. Our ECSLBot tunedby GAs performs well against both melee units and rangedunits in a variety of scenarios.

The remainder of this paper is organized as follows. Sec-tion II describes related work in RTS AI research and commontechniques used in RTS micro research. The next sectiondescribes our RTS research environment. Section IV explainsthe specifics of our methodology. Section V presents resultsand compares solutions produced by our methods with twostate of the art StarCraft bots. Finally, the last section drawsconclusions and discusses possible directions for future work.

II. RELATED WORK

Typically, industry RTS AI developers create RTS AI notso much for beating opponents as to entertain and tutor users.Industry AI employs techniques such as finite state machines,rule based systems, and scripting [7], [8]. Some industryAIs may cheat. On the the other hand, academic RTS AIresearch focuses on using learning or reasoning techniques towin an RTS game and reach human competitive performance.Our research falls in the academic category. Much work hasbeen done in applying different techniques to build RTS AIplayers in academia. For example, some work has been donein our lab on co-evolving macro and robust build ordersin WaterCraft [9]. Ontanon et al. used real-time case-basedplanning (CBP) in an RTS game Wargus [10]. Weber andMateas presented a data mining approach to strategy predictionby learning from StarCraft replays [11]. Churchill et al.adopted Alpha-Beta search approach from board games forRTS combat scenarios of up to eight versus eight units [12].This paper focuses on the work related to using IMs and PFsfor spatial reasoning and unit movement. Miles et al. appliedIMs to evolve a LagoonCraft RTS game player [13]. Sweetser

Page 3: IEEE TRANSACTIONS ON COMPUTATIONAL …simingl/Dissertation/Evolving Effective...Macro and micro management are two ... evolved in one scenario work well in other scenarios and that

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, SPECIAL ISSUE ON REAL-TIME STRATEGY GAMES, JUNE 2016 3

et al. developed a game agent designed with IMs and cellularautomata, where the IM models the environment and helps anagent make decisions in their EmerGEnt game [14]. They builta flexible game agent that responds to natural phenomena anduser actions while pursuing a goal. Bergsma et al. used IMsto generate adaptive AI for a turn based strategy game [15].Su-Hyung et al. proposed a strategy generation method usingIMs in the strategy game Conqueror. He applied evolutionaryneural networks to evolve non-player characters’ strategiesbased on the information provided by layered IMs [16]. Averyet al. worked on co-evolving team tactics using a set of IMs,guiding a group of friendly units to move and attack enemyunits based on the opponent’s position [17]. Their approachused one IM for each entity in the game to generate differentunit movement. However, this method does not scale well tolarge numbers of units. For example, if we have two hundredentities, the population cap for StarCraft, we will need torecompute two hundred IMs every update. This could be aheavy load for our system. Preuss et al. used a flockingbased and IM-based path finding algorithm to enhance groupmovement in the RTS game Glest [18], [19]. Raboin et al.presented a heuristic search technique for multi-agent pursuit-evasion games in partially observable space [20]. In this paper,we use an enemy units position IM combined with a terrain IMto gather spatial information and guide our units in producingwinning micro behaviors for RTS games.

Potential fields have also been applied to AI research inRTS games. Most of the prior work in PFs is related to unitmovement for spatial navigation and collision avoidance [21].This approach was first introduced by Khatib in 1986 while heworked on real time obstacle avoidance for mobile robots [22].The technique was then widely used in avoiding obstacles andcollisions, especially in multiple unit scenarios with flock-ing [23], [24], [25]. Hagelback et al. applied this techniqueto AI research within an RTS game [26]. They presented aMulti-Agent Potential Field based bot architecture in the RTSgame ORTS [27] and incorporate PFs into their AI player atboth tactical and unit reactive control level [28]. We have alsodone some prior work in PFs [4], [29] and use two PFs forgroup navigation in our work.

Reactive control, including individual unit movement andbehavior, aims at maximizing damage output to enemy unitsand minimizing the damage to friendly units. Common mi-cro techniques in combat include fire concentration, targetselection, fleeing, and kiting. Uriarte et al. applied IMs forkiting, frequently used by human players, and incorporatedkiting behavior into his StarCraft bot Nova [6]. Gunnerud etal. introduced a CBR/RL hybrid system for learning targetselection in given situations during a battle [30]. Wender et al.evaluated the suitability of reinforcement learning algorithmsto micro manage combat units in RTS games [31]. The resultsshowed that their AI player was able to learn the selected taskslike “Fight”, “Retreat”, and “Idle” during combat. We scriptedour reactive control behaviors with a list of unit features rep-resented by six parameters. Each set of parameters influencesreactive control behaviors including kiting, targeting, fleeing,and movement.

III. RTS RESEARCH ENVIRONMENT

In our field, we believe a suitable RTS research environmentshould be popular, open source, and speed adjustable. Star-Craft is one of the most popular RTS research platforms due tothe StarCraft: Brood War Application Programming Interface(BWAPI) framework and the AIIDE3 and CIG4 StarCraft AIcompetitions [32]. BWAPI provides an interface which allowsour program to interact with StarCraft game data throughcode instead of keyboard and mouse. However, StarCraft wasdesigned for human players and as such has some limitationsfor evolutionary computing approaches. Researchers have thusdeveloped RTS games such as SeaCraft, Wargus, and ORTSdesigned specifically for scientific research. In our work, werun all of our experiments on the popular platform, StarCraft,which allows us to compare our work with our peers.

A. StarCraft and Bots

StarCraft is one of the most well known RTS games witha huge player base and numerous professional competitionsall over the world. The game has three different but wellbalanced races that players can choose from: Terran, Protoss,and Zerg. Thanks to the popularity of the StarCraft and recentStarCraft AI tournaments, many groups have been using it asa platform for their research. Typically, researchers developRTS AI players called “bots” which are capable of playingStarCraft. In our research, we apply heuristic search algorithmsto generate effective micro behaviors and compare the microperformance of our ECSLBot with two other state of the artbots: UAlbertaBot and Nova. The bots we used in this paperare listed below:

• UAlbertaBot: Developed by Churchill from the Uni-versity of Alberta. UAlbertaBot won the AIIDE 2013StarCraft competition as a Protoss player.

• Nova: Developed by Uriarte from Drexel University.Nova was ranked number 7 on the AIIDE 2013 StarCraftcompetition.

• SCAI: The passive StarCraft AI was used as our baselinein evaluating the micro performance of other bots.

• ECSLBot: Developed by us and currently only doesmicro, using parameters generated by our approach.

We are aware that UAlbertaBot is mainly a Protoss playerand optimized for using Protoss units. However, UAlbertaBotis also capable of playing Terran and Nova was specificallyoptimized by Uriarte for kiting a Terran unit “Vulture”, sowe select Vulture to evaluate our method. Another reasonfor selecting a ranged unit Vulture was that comparing tomelee units, ranged units usually need more micro operationsand could benefit more from good micro behaviors in a RTSskirmish.

The micro logic of UAlbertaBot is handled by two compo-nents called MeleeManager and RangedManager for all typesof units rather than a separate component for each specific unittype. This means that UAlbertaBot uses the same logic for allmilitary units and ignores the differences between units. For

3http://www.StarCraftAICompetition.com4http://cilab.sejong.ac.kr/sc competition/

Page 4: IEEE TRANSACTIONS ON COMPUTATIONAL …simingl/Dissertation/Evolving Effective...Macro and micro management are two ... evolved in one scenario work well in other scenarios and that

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, SPECIAL ISSUE ON REAL-TIME STRATEGY GAMES, JUNE 2016 4

(a) Train1: A bot controlling 5 Vultures at top leftfighting against 25 Zealots at the bottom right in 2500frames. The score will be higher with more Zealotskilled.

(b) Train2: A bot controlling 5 Vultures at top leftfighting against 6 Vultures at the bottom right. Thescore will be higher with more Vultures killed andshorter of the game duration.

Fig. 3: Training scenarios.

example, both Vulture and Dragoon are ranged attackers andcan “kite” or “hit and run” against melee units, but they shouldkite differently based on their unique weapon cool down times.Nova uses IMs for unit navigation and kiting when controllingVultures.

B. Scenarios

The first step of this research was building an infrastruc-ture within which to run our bot. We designed a series ofcustomized StarCraft maps by using StarEdit, a free toolprovided by Blizzard Entertainment to build custom scenarios.As explained earlier, we want ECSLBot to learn to control aspecific type of Terran unit (Vulture) to fight against differenttypes of enemy units. More specifically, we evolve Vulturekiting behavior against melee enemy units and target selectionand fleeing against ranged enemy units. Melee and rangedunits are the two broad types of units in RTS games likeStarCraft. After learning how to play against different typesof enemies in two training scenarios, we test the gener-alizability of learned behaviors on eight previously unseentesting scenarios. Finally, we compare our ECSLBot, Nova,and UAlbertaBot against each other using Vultures. Playersin RTS games are usually not able to access complete stateinformation because of the “fog of war”. The “fog of war”influences longer-term planning for distant units, while we arefocused on short-term planning for units in close proximity.Furthermore, good human players will start a skirmish onlywhen they already have enemy and terrain information throughscouting. Therefore, we allow all bots to access complete stateinformation for all scenarios in this research. In another word,there is no “fog of war” in our scenarios.

1) Training Scenarios: Human players usually use differentmicro behaviors against melee units than they use againstranged units. Therefore, we evolve our bot against exemplarsof these two broad types of enemy units, separately. In ourexperiments, ECSLBot learns how to control Vultures to defeatmelee enemy units on a scenario containing 5 friendly Vulturesand 25 opponent Zealots (a Protoss melee unit) as shown inFigure 3a. We call this training scenario, Train1 and the

parameters evolved on this scenario, Pm. The goal of thisscenario is eliminating as many enemy Zealots as possible.Kiting efficiency is important in this type of combat and willbe evolved by our GAs.

Our second scenario was created for fighting against rangedattack units. We call this scenario, Train2, and the parametersevolved here, Pr. Figure 3b shows that our ECSLBot controls5 friendly Vultures positioned at the top left to fight against6 enemy Vultures positioned at the bottom right. Positioningand target selection become key contributors in this scenario.Both Train1 and Train2 run for 2500 frames or until one sideis eliminated. Finally, once these parameter sets are evolved,ECSLBot switches between these two sets based on whetherits target type is ranged or melee. This switching happens inreal time during a game.

2) Testing Scenarios: We evolve micro behaviors for fight-ing against melee and ranged enemy units in the previous twotraining scenarios. However, we are interested in evaluatingthe generalizability of our evolved behaviors on scenariosnever encountered before. Therefore, we created eight newtesting scenarios with more units, mixed types of opponentunits, and different terrain. We expect ECSLBot evolved ontwo simple training scenarios to perform similarly in otherunseen situations because our evolved parameters representa range of behaviors that are relatively position and terrainindependent and simply switching between parameter setstakes into account the two broad opponent types in RTS games.We first test our evolved ECSLBot on the two test scenariosshown in Figure 4. Here, friendly Vultures spawn at the bottomleft of the map instead of top left. Enemy units spawn at the topof the map and are split into two groups. These two scenarios(Test1 and Test2) test whether ECSLBot is able to adaptwhen the initial positions of friendly units and enemy unitschanges.

Next, we considered four scenarios where we change thenumber of units controlled by our bots. We evolved ECSLBotfor controlling five Vultures against enemy units. We nowinvestigate how ECSLBot performs when controlling morethan five Vultures. Figure 5 shows four scenarios (Test3,

Page 5: IEEE TRANSACTIONS ON COMPUTATIONAL …simingl/Dissertation/Evolving Effective...Macro and micro management are two ... evolved in one scenario work well in other scenarios and that

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, SPECIAL ISSUE ON REAL-TIME STRATEGY GAMES, JUNE 2016 5

(a) Test1: 5 Vultures at bottom left versus 6Vultures which are split into 2 groups.

(b) Test2: 5 Vultures at bottom left versus 25Zealots which are split into 2 groups.

Fig. 4: Testing scenarios where only the initial position of units changed.

Test4, Test5, and Test6) where our ECSLBot controls tenVultures instead of five, to fight against differing numbers ofenemy units. Furthermore, we consider more complex terrain.Test6 contains an untraversable block in the middle of the mapas shown in Figure 5d. Results show that ECSLBot learnedto avoid map boundaries while kiting during training, andthis generalized to the case of the never before encountereduntraversable block in the center of Test6. ECSLBot adaptsto the block in the center and kites well avoiding the block.

Finally, in the last two scenarios we evaluate generalizabilityagainst mixed opponent types. ECSLBot simply switches be-tween Pm and Pr, the two evolved parameter sets according towhether the current target unit is melee or ranged respectively.Figure 5e shows the testing scenario, Test7, where ECSLBotcontrols ten Vultures and fights against eight Zealots andeight Vultures. Since we were particularly interested in theperformance of ECSLBot fighting against mixed opponentunits, we also created the last scenario, Test8, shown inFigure 5f. ECSLBot controls ten Vultures to fight againsteighteen Zerglings and ten Hydralisks which are melee andranged unit types respectively, from a different StarCraft Race(Zerg). These units have different melee damage and differentweapons range when compared to Zealots and Vultures. Notethat the numbers and types of units used in the above scenarioswere chosen to show the difference in micro performanceof the three bots. Too few and all three bots killed all theopponent units, too many and all three bots’ units died.Eighteen Zerglings and ten Hydralisks were a good match forten Vultures.

3) Head-to-head Scenario: After we compared our EC-SLBot, UAlbertaBot, and Nova on a variety of testing scenar-ios where our bots control different number of Vultures againstdifferent types of enemy units controlled by SCAI, we werealso interested in how they perform when competing againsteach other with identical units. A new scenario was designedfor this comparison where all three bots control five Vulturesagainst each other. This scenario contains no obstacles likeTrain1 and Train2.

IV. METHODOLOGY

In our scenarios, the micro bot attempts to defeat theopponent by eliminating enemy units while minimizing the

loss of friendly units. A secondary objective is to do this asquickly as possible. Our scenarios contain two StarCraft unittypes: Vulture and Zealot. A Vulture is a Terran unit with aranged attack weapon, low hit-points (easy to destroy), andfast movement. On the other hand, a Zealot is a Protoss unitwith a melee weapon, high hit-points (hard to destroy), andslow movement.

A. Influence Maps and Potential Fields

We compactly represent micro behaviors as a combinationof two IMs, two PFs, and a set of reactive control parameters.The IM generated from enemy units combined with the terrainIM tells our units where to go and PFs are used for unitnavigation. The unit IM and the terrain IM are functions of theweights and ranges of enemy units and untraversable terrain(walls and obstacles). Since computation time depends on thenumber of IM cells, we use a cell size of 32× 32 pixels. Theentire map consists of a 64 × 64 grid of such 32 × 32 pixelcells. Note that as enemy units move, the unit IM changes. Thesum IM therefore also changes and no matter what the actualStarCraft map we play on and where on that map opponentunits are positioned, the sum IM indicates vulnerable positionsto attack as well as positions not to attack. The Algorithm 1described below chooses on specific location to attack basedon the current sum IM. We first find the lowest value IM cellthat contains an enemy unit. Call this Cellt. This cell denotesthe enemy that the IM indicates is most separated from therest of the enemies. The algorithm then finds the IM cell withthe lowest value within the Cellt’s immediately adjacent cells.We choose the first hit in case there are multiple cells found.We call this new IM cell, Cellnt. The algorithm then choosesCellnt as the location to first move to and from which tothen launch the attack. In case Cellnt is far from the Cellt,ECSLBot may perform less well resulting in a lower fitness.The IM parameters that result in poor performance should bequickly eliminated by the GA.

Equation 1 shows a typical PF function where Force is thepotential force on the unit, d is the distance from the source ofthe force to the unit. c is the coefficient and e is the exponentapplied to distance and used to adjust the strength and directionof the vector force.

Force = cde (1)

Page 6: IEEE TRANSACTIONS ON COMPUTATIONAL …simingl/Dissertation/Evolving Effective...Macro and micro management are two ... evolved in one scenario work well in other scenarios and that

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, SPECIAL ISSUE ON REAL-TIME STRATEGY GAMES, JUNE 2016 6

(a) Test3: 10 Vultures at top left versus 10Vultures at the bottom right of the map.

(b) Test4: 10 Vultures at top left versus 12Vultures at the bottom right of the map.

(c) Test5: 10 Vultures at top left versus 40Zealots which are spread into 3 groups atthe bottom of the map.

(d) Test6: 10 Vultures at top left versus40 Zealots at bottom right with unwalkablearea in the middle of the map.

(e) Test7: 10 Vultures at top left versus 8Zealots and 8 Vultures at the bottom rightof the map.

(f) Test8: 10 Vultures at top left versus 18Zerglings and 10 Hydralisks at the bottomright of the map.

Fig. 5: Testing scenarios with ten friendly Vultures and different types of enemy units. All maps used in this paper are thesame size of 64× 64.

Algorithm 1 Squad Targeting and Positioning AlgorithmInitialize TerrainIM, EnemyUnitIM, SumIM;Cellt = the lowest value cell that contains an enemy unit onSumIM;target = the enemy unit in Cellt;Cellnt = the cell with the minimum cell value and the smallestdistance to target on SumIM;movePos = Cellnt.getPosition();squad.moveTo(movePos);if squad.getPosition() ≈ movePos then

squad.attack(target);end if

We use two PFs of the form described by Equation 1 tocontrol the movement of units. Each PF calculates one forceacting on a unit. The two potential forces in our game worldare:

• Attractor: The attraction force is generated by the unit’sdestination - the unit is attracted to its destination. Thisforce is inversely proportional to distance.

• Repulsor: This keeps friendly units moving towards thedestination from colliding with each other. It is usuallystronger than the attractor force at short distances andweaker at long distances.

Each PF is determined by two parameters, a coefficientc and an exponent e. Therefore, we use four parameters to

determine a unit’s PFs:

PF = cadea + crd

er (2)

where ca and ea are parameters of the attractor force, cr ander for the friend repulsor force. These parameters are thenencoded into a binary string for our algorithms.

B. Reactive Control

Besides the group positioning and unit movement, reactivecontrol behaviors must be represented in a way that ouralgorithms can process. In our research, we considered threereactive control behaviors: kiting, target selection, and fleeingfrequently used in real games by good human players. Wefirst developed and implemented parameterized algorithmsfor kiting (Algorithm 3) and target selection (Algorithm 2).We then used the genetic algorithm to evolve the values ofthese parameters for specific friendly unit and enemy unittypes. These algorithms do not depend on position. Theparameters affect target prioritization, relative kiting location,and determine when to flee based on our unit’s health. Onceevolved, we expect to be able to use our parameterized microbehaviors tuned by genetic algorithms on any map with anydistribution of friendly and enemy units. Note however, thatwe will need to evolve parameters for every pair of friendlyand enemy unit types. This will take time but is feasible formost RTS games which usually implement less than twenty

Page 7: IEEE TRANSACTIONS ON COMPUTATIONAL …simingl/Dissertation/Evolving Effective...Macro and micro management are two ... evolved in one scenario work well in other scenarios and that

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, SPECIAL ISSUE ON REAL-TIME STRATEGY GAMES, JUNE 2016 7

(20) unit types. Figure 6 shows the six parameters used inthese algorithms and Table I explains the details and purposeof each parameter (variable).

Fig. 6: Variables used to represent reactive control behaviors.The Vultures on the left side of map are friendly units. TwoVultures on the right are enemy units.

• Target Selection: Concentrating fire on one target, orswitching to a nearby enemy. In Algorithm 2, each of ourunits selects the nearest enemy unit as a possible target,tclosest. Within a distance Rnt (an evolved parameter asshown in Table I) from tclosest, our unit will choose anactual target based on the following prioritized list:

1) tlowhp, the lowest hit-point enemy unit below theevolvable threshold: HPef .

2) tfocus, the enemy unit being targeted by the mostfriendly units within Rnt relative to tclosest.

3) tclosest, the nearest enemy unit.• Kiting: Also known as “hit and run”. This behavior is es-

pecially useful in combat where our units have a larger at-tack range than enemy units. In Algorithm 3, unit movestowards and attacks the target whenever unit’s weapon isready. unit will start kiting if the distance between unitand target is less than Dk and our unit’s weapon is notready (for example, because it is in cool down). The unitmoves back to a kitingPosition computed by a functiongetKitingPositionFromIM(Dkb) that uses the sumIM as shown in Algorithm 3, line 6. Dkb is the number ofcells away from target cell. If Cellt is the IM cell con-taining our target, getKitingPositionFromIM(Dkb)finds a neighboring IM cell with the lowest value. We setthis new IM cell to Cellt. We then repeat this processwith the new Cellt, Dkb times. The algorithm thenchooses Cellt as the kitingPosition to move towards.

• Flee: Temporarily repositioning to the back of our forcesaway from the front line of battle when our units havelow hit-points. HPfb controls this “fleeing” behavior.

We encoded a candidate solution into a 60-bit string.The detailed representation of IMs, PFs, and reactive controlparameters are shown in Table I. Note that the sum IMis derived by summing the enemy unit IM and terrain IMso it does not need to be encoded. When the game enginereceives a candidate solution, it decodes the binary string intocorresponding parameters according to Table I and directsfriendly units to move to a location specified by Algorithm 1and then attack enemy units. The fitness of this candidate

TABLE I: Chromosome

Variable Bits Description

IM

WU 5 Enemy unit weight in IMs.RU 4 Enemy unit influence range in IMs.WT 5 Terrain weight in IMs.RT 4 Terrain influence range in IMs.

PF

ca 6 Attractor coefficient.cf 6 Repulsor coefficient.ea 4 Attractor exponent.ef 4 Repulsor exponent.

St 4 The waiting time to move after each firing. Usedfor kiting.

Rea

ctiv

eC

ontr

ol

Dk 5 The distance from the target that our unit start tokite.

Rnt 4 The radius around current target. Other enemyunits within this range will be considered to bea new target.

Dkb 3 The distance, in number of cells, for our unit tomove backward during kiting. This is a parameterto getKitingPositionFromIM() in Algorithm 3.

HPef 3 The hit-points of nearby enemy units, underwhich target will be assigned.

HPfb 3 The hit-points of our units, under which unit willflee.

Total 60

solution at the end of each match is then computed and sentback to our GA. Once units engage an enemy unit, Algorithm 2takes over to perform reactive control behaviors like targetselection and kiting.

Algorithm 2 Target Selection1: for all unit in squad do2: tclosest = the nearest enemy unit to unit3: tlowhp = the enemy unit with the lowest hitpoint within Rnt

relative to tclosest4: tfocus = the enemy unit being targeted the most within Rnt

relative to tclosest5: if tlowhp smaller than the threshold (HPef ) then6: target = tlowhp

7: else8: target = tfocus9: end if

10: end for

Algorithm 3 Kiting1: for all unit in squad do2: if unit is ready to attack then3: unit.attack(target)4: else if unit.distance(target) < Dk then5: WaitForFrames(St)6: kitingPosition = getKitingPositionFromIM(Dkb);7: unit.move(kitingPosition)8: end if9: end for

C. Fitness Evaluation

We evolve the behaviors for fighting against melee unitsand ranged attack units separately. Our first fitness evaluation

Page 8: IEEE TRANSACTIONS ON COMPUTATIONAL …simingl/Dissertation/Evolving Effective...Macro and micro management are two ... evolved in one scenario work well in other scenarios and that

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, SPECIAL ISSUE ON REAL-TIME STRATEGY GAMES, JUNE 2016 8

maximizes the damage to enemy units, minimizes the damageto friendly units, and minimizes the game duration in scenarioswith ranged attack enemy units. In this case, a unit remainingat the end of game will contribute 100 to its own side. Thefitness of an individual will be determined by the differencebetween the number of friendly units and the number of enemyunits at the end of each game. For example, suppose threefriendly Vultures and one enemy Vulture remain at the end ofthe game, the score will be (3−1)∗100 = 200 as shown in thefirst term of Equation 3. Negative fitnesses when the number ofenemy units is greater than the number of friendly units werenot allowed to reproduce and typically stopped appearing afterthree generations. The detailed evaluation function to computefitness against ranged units (Fr) is:

Fr = (NF −NE)× Su + (1− TMaxT )× St (3)

where NF represents how many friendly units remained, NE

is the number of enemy units remaining. Su is the score forsaving a unit (100) as defined above. Since Equation 3 is usedin the scenario Train2 which contains only one type of unit(Vulture), Su is sufficient to represent score of any unit. Thesecond term of the evaluation function computes the impactof game time on score. T is the time spent on the wholegame, the longer a game lasts, the lower is 1 − T

MaxT . St

in the function is the weight of time score which was set to100. Maximum game time is 2500 frames for experimentsusing this training scenario (Train2), approximately one anda half minutes at normal game speed. We took game timeinto our evaluation because “timing” is an important factor inRTS games. Suppose combat lasts one minute. This might beenough time for the opponent to relocate backup troops fromother places to support the ongoing skirmish thus increasingthe chances of our player losing the battle. Therefore, combatduration becomes a crucial factor that we want to take intoconsideration in our evaluation function. We kept the evalu-ation function Fr simple, while encapsulating all three mainobjectives. Our evaluation function leads ECSLBot to learnmicro behaviors that maximize the damage to enemy units,minimize the damage to friendly units, and minimize the gameduration.

In the melee training scenario, Train1, good players can beexpected to maximize damage and destroy all opponents bykiting well. To reflect this bias towards damage we increasethe weight given to destroying an enemy unit and reduce theweight for losing a friendly unit. Therefore, in Train1, wherewe want to see how many Zealots can be eliminated by 5Vultures during 2500 frames, we add 200 to the score fordestroying an enemy Zealot while losing a friendly Vultureswill subtract 150, therefore, the second melee specific fitnessfunction (Fm) is:

Fm = NE ×DSET −NF ×DSFT (4)

where NF represents how many enemy units were killed,NE is the number of friendly units being killed. DSET andDSFT are the destroy scores for the types of unit beingkilled as defined in StarCraft. We apply this fitness function inexperiments dealing with Train1 where we want to evaluatehow fast our bots can eliminate melee attack enemy units.

Although this equation does not specifically deal with time,we have 25 zealots in this scenario and the faster you candestroy a Zealot, the more Zealots can be destroyed. Thisimplicitly drives evolution towards parameters that lead tofaster elimination of enemy units. Train1 also runs for 2500frames.

Summarizing, ECSLBot learns how to fight melee unitsin Train1 shown in Figure 3a using fitness function Fm.ECSLBot learns how to fight against ranged units in theTrain2 shown in Figure 3b using fitness function Fr. Aftertraining and learning to handle both melee and ranged units,ECSLBot simply switches between the two sets of learnedparameters, Pm and Pr, according to the current target enemyunit type (melee or ranged). Note that we update 3 IMs forPm and 3 IMs for Pr in every frame.

D. Genetic Algorithm

We used a CHC based GA in our experiments [33], [34].CHC stands for Cross generational elitist selection, Het-erogeneous recombination and Cataclysmic mutation. CHCselects the N best individuals from the combined parent andoffspring populations (2N ) to create the next generation afterrecombination. Early experiments indicated that our CHC GAworked significantly better than the canonical GA on ourproblem.

Following prior experiments in our lab, we set the pop-ulation size to 20 and ran the GA for 30 generations. Theprobability of crossover was 88% and we used CHC selection.We also used bit-mutation with 1% chance of each individualbit flipping in value. SCAI was used to control the opponentforce in our evaluations. Standard roulette wheel selection wasused to select chromosomes for crossover. CHC being stronglyelitist, helps to keep valuable information from being lost ifour GA produces low fitness children. These operator choicesand GA parameter values were empirically determined to workwell.

V. RESULTS AND DISCUSSION

We used StarCraft’s game engine to evaluate our evolvingsolutions. In order to increase the difficulty of game play andbecause this is used in professional game play, the behaviorof the StarCraft game engine was set to be non-deterministicfor each game. In this case, some randomness is added by thegame engine thus affecting the probability of hitting the targetand the amount of damage done. This randomness is restrictedto a small range so that results are not heavily affected.Non-determinism does not impact some scenarios such asVultures against Zealots significantly, because, theoreticallyVultures can “kite” Zealots to death without losing even onehit-point. But the randomness may have an amplified effectin other scenarios. To mitigate the influence of this non-determinism, individual fitness is computed from the averagescores in five games. Furthermore, our results are collectedfrom averaged scores over ten runs, each with a differentrandom seed. Early experiments showed that the speed ofgame play affects outcomes as well. The definition of gamespeed in BWAPI is the wait time between two consecutive

Page 9: IEEE TRANSACTIONS ON COMPUTATIONAL …simingl/Dissertation/Evolving Effective...Macro and micro management are two ... evolved in one scenario work well in other scenarios and that

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, SPECIAL ISSUE ON REAL-TIME STRATEGY GAMES, JUNE 2016 9

frames. A number of 0 in game speed indicates that framesare executed immediately with no delay. We set our games toa slower speed of waiting 10 milliseconds between any twoframes to reduce the effect of the randomness. Our experimentenvironment was a workstation with Intel Xeon E5-1620 CPUand 6 GB memory. Given the above experimental setup, thefollowing subsections describe our results.

A. Training ECSLBot

Scenario Train1 as shown in Figure 3a evaluates theefficiency of kiting behavior against melee attack units. Fm

as defined in Equation 4 is used as our evaluation function inthis scenario. Figure 7 shows the average scores of ECSLBotsrunning on this kiting scenario. We can see that the maximumfitness in the initial population is as high as 3217. However,the average of maximum fitness increases slowly to 3660 atgeneration 30. This results tell us that our GAs can quicklyfind a kiting behavior to perform “hit and run” against meleeattack units while trading off damage output. Our ECSLBottrades off well between kiting for safety and kiting for damageto enemy.

Fig. 7: Average score of ECSLBot versus generations onscenario Train1. X-axis represents time and Y-axis representsfitness by the fitness function Fm.

Besides performance against melee attack units, we are alsointerested in performance against ranged attack units. In thiscase, positioning and target selection become more importantthan kiting because the additional movement from kitingbehavior will waste enemy damage output while avoidingenemy attack. We used our GAs to search for effective microbehaviors using the same representation as in the previousscenario. However, we changed our fitness evaluation functionto Fr as shown in Equation 3 to maximize killing of enemyunits, minimize the loss of friendly units, and minimize combatduration. Figure 8 shows the average score of the evolvingECSLBots in scenario Train2. The average maximum fitnessfound by GAs is 336, which means 3 friendly units remainedat the end of the game and all enemy units were eliminated.Considering that the Vulture is a vulnerable unit and easilydies, 3 Vultures saved after a skirmish is, we believe, goodperformance.

We are interested in the differences in evolved parametersfor the two training scenarios - against melee attack units

Fig. 8: Average score of ECSLBot over generations in sce-narios Train2. X-axis represents time and Y-axis representsfitness by the fitness function Fr.

and ranged attack units. Table II lists the details of optimalsolutions in different scenarios. Videos of all learned microbehaviors can be seen online 5. We would like to highlighttwo findings in these results. The first concerns the learnedoptimal attack route in the scenario against six Vultures asshown in Figure 9. The IM parameters evolved by the GAand our control algorithms 1, lead to a gathering location atthe left side of the map to move toward before the battle.Our ECSLBot then commands the five Vultures to follow thisroute to attack enemy units. The result is that only three ofthe enemy units are triggered in the fight against our fiveVultures at the beginning of the fight. This group positioninghelped ECSLBot minimize the damage taken from enemyunits while maximizing damage output from outnumberedfriendly units. Although we describe a behavior that is specificto this scenario, the IM parameters tend to guide our units tosuch vulnerable positions with respect to enemy forces locatedanywhere on any map. As performance in never-before-seentesting scenarios shows, using an IM helps generalizability oflearned behaviors. This is detailed in Section V-C where ourforces move to a location that the IM indicates is a locationwhere enemy forces are less concentrated.

Fig. 9: Learned optimal attacking route against 6 Vultures.

The second interesting finding is that different micro be-haviors are learned by ECSLBot in different scenarios. Theresult shows that our ECSLBot kited heavily against Zealots,but seldom move backward against ranged attack units. The

5http://www.cse.unr.edu/∼simingl/publications.html

Page 10: IEEE TRANSACTIONS ON COMPUTATIONAL …simingl/Dissertation/Evolving Effective...Macro and micro management are two ... evolved in one scenario work well in other scenarios and that

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, SPECIAL ISSUE ON REAL-TIME STRATEGY GAMES, JUNE 2016 10

TABLE II: Parameter values of best evolved individuals.

Scenario IM PF Reactive ControlTrain1,25 Zealots

3 9 15 8 43 55 6 2 1 5 6 7 7 0

Train2,6 Vultures

16 13 20 10 50 26 13 4 12 9 1 7 6 7

values of our parameters reflect this behavior. Table II showsthe parameter values found by our GAs in the two trainingscenarios. We can see that St (the first parameter in the reactivecontrol section) is 1 frame in the scenario against melee attackunits, which means a Vulture starts to move backward rightafter every shot. On the other hand, St is much bigger (12frames) against ranged attack units. This is because our unitswill gain more benefit after each weapon firing by standingstill and firing again as soon as possible rather than movingbackward immediately against ranged attack units.

B. Comparing ECSLBot with State of the Art Bots in TrainingScenarios

Next, we investigate the performance differences among ourECSLBot (using Pm and Pr evolved by GAs), UAlbertaBot,and Nova. ECSLBot switches between Pm and Pr dependingon the current target type. We used UAlbertaBot and Nova tocontrol the same number of Vultures (5) against 25 Zealots intraining scenario Train1. Table III shows the results for allthree bots versus the baseline SCAI over 30 runs on the twotraining scenarios. We can see that the UAlbertaBot performedpoorly against melee attack units. This seems to be mainlybecause UAlbertaBot uses the same logic for all its units andthe logic is optimized only for Protoss units. It eliminatedonly 3.33 Zealots on average in each game, while losingall of its Vultures. Note that UAlbertaBot was designed toplay as Protoss. On the other hand, Nova’s performance isgood. Nova killed 20.03 Zealots and lost only 0.3 Vultureson average per run. This is because Nova has hard coded andtuned logic specifically for Vultures and is optimized to controlVulture kiting behavior against melee attack units. We thentested ECSLBot on scenario Train1. The results show thatECSLBot got the higher score on average over 30 runs. 20.2Zealots being killed in one match on average, while losingonly 0.2 Vultures. Visually, ECSLBot and Nova seem to havevery similar kiting behavior and performance and statistically,the difference in performance is not significant at p = 0.79using the t-test. All significance tests reported in this paper arebased on independent two-sample student’s t-test, assumingunequal variance and equal sample sizes. The significancelevel is α=0.05. Considering all the comparing contains threedatasets, the likelihood of rejecting the null hypothesis whenit’s true increases. We then applied the bonferroni correctionand set the significance level to α = 0.05/3 = 0.0167.

Table III shows the results from all of our three botstested in scenario Train2. All the bots run 30 times againstSCAI. This time, both UAlbertaBot and Nova perform poorly.UAlbertaBot loses all 30 games against 6 Vultures, killing 2.67enemy Vultures on average in each game, while losing all of its

units. Nova performed slightly better than UAlbertaBot with2 wins and 28 losses out of 30 runs. However, ECSLBot out-performed both the others with 60% winning rate. 5.2 enemyVultures were eliminated and 1.8 friendly Vultures survived onaverage in each run. This is statistically significantly differentat p = 6.04 × 10−7 using the t-test on the number of Zealotkillings between ECSLBot and Nova. This result indicatesthat kiting against ranged units is not as effective as kitingagainst melee units. Positioning and target selection becomemore important than kiting in such scenarios. UAlbertaBot andNova did not optimize micro behaviors in all scenarios andperformed poorly in these cases. Note however, that ECSLBotneeds about 21 hours to evolve either Pm or Pr.

C. Comparing ECSLBot with State of the Art Bots in TestingScenarios

Since we evolved micro behaviors for ECSLBot in twotraining scenarios, we investigate how the corresponding pa-rameter sets perform in scenarios never encountered before.Therefore, we compared our evolved ECSLBot to Nova andUAlbertaBot on eight testing scenarios as shown in Figure 4and Figure 5. Table IV shows the results of the three botsplaying against SCAI on all eight scenarios. The table providesstandard deviations.

The first two testing scenarios Test1 and Test2 as shownin Figure 4 are somewhat similar to the two training scenariosTrain1 and Train2 with the only change being the units’positions. The purpose of these two scenarios is to evaluatehow well ECSLBot’s parameters work when the positionsof both friendly units and enemy units are changed. Theresults show that ECSLBot and Nova are still good at kitingZealots in these new scenarios. 17.4 Zealots were killedby Nova and 19.8 Zealots were killed by ECSLBot during2500 frames on Test2. This testing scenario performance issimilar to performance on the training scenarios. UAlbertaBotonly killed 2.63 Zealots and lost all 5 Vultures on Test2.However, UAlbertaBot performs better when destroying splitenemy Vultures on Test2. 5.97 Vultures were killed and2.0 Vultures survived on average over thirty runs. ECSLBotperforms similar to UAlbertaBot on Test2. Nova performsbadly on fighting against ranged attack units which is similarto the results on training scenarios. Eleven out of thirtymatches are lost against six split Vultures. The difference inperformance between UAlbertaBot and ECSLBot in Test1is not statistically significant at p = 0.25. On Test2, thedifference in performance between ECSLBot and Nova isstatistically significant at p = 2.92× 10−5.

In order to test the micro performance of ECSLBot forcontrolling different numbers of Vultures, other than five wetrained on, we conducted experiments where bots controlten Vultures to fight against different types of enemy unitsincluding ranged units, melee units, mixed units, and differentterrain. The results on scenarios Test3 and Test4 show thatall three bots do well in destroying enemy units. However,ECSLBot saved 8.1 and 6.4 out of 10 friendly Vulturesin two scenarios. Nova saved 6.93 and 3.06 Vultures andUAlbertaBot saved only 3.97 and 1.13 Vultures. The difference

Page 11: IEEE TRANSACTIONS ON COMPUTATIONAL …simingl/Dissertation/Evolving Effective...Macro and micro management are two ... evolved in one scenario work well in other scenarios and that

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, SPECIAL ISSUE ON REAL-TIME STRATEGY GAMES, JUNE 2016 11

TABLE III: The performance of bots controlling 5 Vultures vs opponent units on training scenarios.

Train1: 5 Vultures vs 25 Zealots Train2: 5 Vultures vs 6 VulturesWin Draw Lose Avg Killed / σ Avg Left / σ Win Draw Lose Avg Killed / σ Avg Left / σ

UAlbertaBot 0 0 30 3.33 / 2.1 0 / 0 0 0 30 2.67 / 0.54 0 / 0Nova 30 0 0 20.03 / 2.37 4.7 / 0.4 2 0 28 3.13 / 1.45 0.13 / 0.56ECSLBot 30 0 0 20.2 / 2.57 4.8 / 0.4 18 0 12 5.2 / 1.35 1.8 / 1.7

TABLE IV: The performance of bots controlling Vultures vs opponent units on testing scenarios.

Test1: 5 Vultures vs 25 Zealots (split) Test2: 5 Vultures vs 6 Vultures (split)Win Draw Lose Avg Killed / σ Avg Left / σ Win Draw Lose Avg Killed / σ Avg Left / σ

UAlbertaBot 0 0 30 2.63 / 1.64 0 / 0 29 0 1 5.97 / 0.18 2.0 / 0.8Nova 30 0 0 17.4 / 2.36 3.77 / 0.84 11 2 17 5.03 / 1.05 0.73 / 1.06ECSLBot 30 0 0 19.8 / 1.51 3.93 / 0.85 27 0 3 5.87 / 0.43 2.87 / 1.28

Test3: 10 Vultures vs 10 Vultures Test4: 10 Vultures vs 12 VulturesWin Draw Lose Avg Killed / σ Avg Left / σ Win Draw Lose Avg Killed / σ Avg Left / σ

UAlbertaBot 23 0 7 9.43 / 1.15 3.97 / 2.71 11 0 19 10 / 2.18 1.13 / 1.58Nova 30 0 0 10 / 0 6.93 / 1.1 25 0 5 11.6 / 1.05 3.06 / 1.9ECSLBot 30 0 0 10 / 0 8.1 / 0.83 29 0 1 11.97 / 0.18 6.4 / 1.74

Test5: 10 Vultures vs 40 Zealots (split) Test6: 10 Vultures vs 40 Zealots (obstacle)Win Draw Lose Avg Killed / σ Avg Left / σ Win Draw Lose Avg Killed / σ Avg Left / σ

UAlbertaBot 1 0 29 12.7 / 2.69 0.03 / 0.18 3 0 27 17.6 / 0.18 2.0 / 0.8Nova 30 0 0 37.6 / 1.8 9.4 / 0.7 30 0 0 32.6 / 2.54 7.6 / 0.99ECSLBot 30 0 0 34.87 / 1.36 8.8 / 0.65 30 0 0 30.8 / 1.42 7.5 / 0.8

Test7: 10 Vultures vs 8 Zealots & 8 Vultures Test8: 10 Vultures vs 18 Zerglings & 10 HydralisksWin Draw Lose Avg Killed / σ Avg Left / σ Win Draw Lose Avg Killed / σ Avg Left / σ

UAlbertaBot 6 0 24 8.9 / 1.57 0.53 / 1.18 5 0 25 24.5 / 2.3 0.67 / 1.57Nova 30 0 0 16 / 0 6.43 / 1.87 30 0 0 28 / 0 6.67 / 1.66ECSLBot 30 0 0 16 / 0 6.5 / 1.5 30 0 0 28 / 0 6.33 / 1.37

in performance on saved units between ECSLBot and Novais statistically significant at p = 5.35 × 10−5 on Test3.These results show that ECSLBot outperformed Nova andUAlbertaBot both on training and testing scenarios againstranged opponents. The ECSLBot seems to find a good balancein the trade off between concentrating fire and kiting.

On scenarios Test5 and Test6, Nova outperformed theother two bots on scenarios with and without obstacle asshown in Table IV. Nova destroyed on average 37.6 and 32.6Zealots in the two scenarios. ECSLBot performs close to Novaand destroyed 34.87 and 30.8 Zealots. In scenario Test5,the difference in performance between Nova and ECSLBotis statistically significant at p = 2.43 × 10−8. The differencebetween UAlbertaBot and the other two bots is statisticallysignificant. This results show that in terms of kiting efficiencyagainst melee units, ECSLBot performs as well as Nova andbetter than UAlbertaBot on both training and testing scenarios.Moreover, ECSLBot is able to use the same set of parametersto perform well despite an untraversable obstacle in the centerof the map by using the terrain influence map learned duringtraining.

Since we test our bots on scenarios against melee unitsand ranged attack units independently, we next investigatedhow our bots fight against enemy units containing both meleeand ranged units. We test our bots on scenario Test7 wherethe enemy is composed of a mix of eight Zealots and eightVultures as shown in Figure 5. Test8 looked at a completely

different set of units from the Zerg race in StarCraft. EighteenZerglings and ten Hydralisks, Zerg melee and ranged units,never encountered by ECSLBot during training made up theopponents in scenario Test8. Results on these two scenariosare displayed in Table IV and show that Nova performs as wellas ECSLBot. Both eliminate all enemy units and more thansix friendly Vultures survive. The differences in performanceof saved units between ECSLBot and Nova on both Test7 andTest8 are not statistically significant. However, the differencebetween UAlbertaBot and the other two bots is statisticallysignificant. ECSLBot performs well in these mixed opponenttype scenarios by switching between Pm and Pr dependingon target type. For example, if the current target is a Zerglingwhich is a melee unit, ECSLBot uses Pm, the parametersevolved in the Train1 melee scenario for the IM, PF, andreactive control. As soon as the target changes to an enemyVulture, ECSLBot will use ranged reactive control parametersthat refer to the ranged attack IM and PF (Pr). This mechanismperforms well even when fighting against enemy units not seenduring training by simply comparing the weapon attack rangesbetween the target unit and the friendly unit to determineranged or melee and thus which parameter set to use.

D. Comparing ECSLBot with State of the Art Bots in Head-to-head Scenario

We have compared the performance of three bots playingagainst SCAI on two training and eight test scenarios and the

Page 12: IEEE TRANSACTIONS ON COMPUTATIONAL …simingl/Dissertation/Evolving Effective...Macro and micro management are two ... evolved in one scenario work well in other scenarios and that

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, SPECIAL ISSUE ON REAL-TIME STRATEGY GAMES, JUNE 2016 12

TABLE V: Head-to-head scenario over 30 matches.

Win Draw Lose Units RemainingUAlbertaBot vs Nova 24 5 1 2.33ECSLBot vs Nova 30 0 0 3.37ECSLBot vs UAlbertaBot 17 1 12 0.30

results show that ECSLBot works well on all scenarios whileNova and UAlbertaBot perform well on some and performbadly on others. However, what are the results when they playagainst each other? To answer this question, we set up our lastset of experiments with the Head-to-head scenario described inSection III-B3. Each bot plays against the other two bots thirtytimes with identical units. Since we evolved parameters forVultures which are a ranged unit, we used Vultures as the unittype in this scenario. ECSLBot thus used Pr for control againstUAlbertaBot and Nova. The result shows that ECSLBot beatsNova but is defeated by UAlbertaBot. ECSLBot won only 8and lost 22 out 30 matches against UAlbertaBot. The replaysshow that ECSLBot’s positioning micro is driven by trainingagainst by SCAI and does not generalize well to other Bots.Thus although ECSLBot’s representation and control algo-rithms evolve to generalize over opponent positions, terrain,and opponent types, they are specifically evolved to beat SCAI.

Therefore, we evolved another set of parameters directlyagainst UAlbertaBot and applied ECSLBot with this set ofparameters against UAlbertaBot and Nova on the Head-to-head scenario. Table V shows the detailed results among allthe bots. We can see UAlbertaBot wins 24 matches, draws 5,and loses 1 against Nova. After examining game replays forthese games, we found that Nova’s micro kites against anytype of opponent units. However, as our experiments with thescenario Train2 showed, kiting too much against the sameranged attack units actually decreased micro performance.UAlbertaBot on the other hand, disabled kiting when fightingagainst the equal weapon range units and defeated Nova easily.Similarly, ECSLBot defeated Nova on all 30 games without aloss or draw. The average number of units surviving was 3.37which is higher than UAlbertaBot’s 2.33. The final comparisonwas between ECSLBot versus UAlbertaBot. The results showthat ECSLBot wins 17 matches, draws 1 match, and loses 12matches out of 30. ECSLBot performed quite well on thisscenario against the other bots.

Although our approach can evolve good micro againstspecific opponents while generalizing over maps and unittypes, for the longer term, we are investigating co-evolutionaryapproaches to evolving micro that is effective against a varietyof opponents. Co-evolutionary approaches have been shownto work well in board games and other video games andprovide a promising computational intelligence approach torobust behavior evolution.

VI. CONCLUSION AND FUTURE WORK

Our research focuses on generating effective micro man-agement: group positioning, unit movement, kiting, targetselection, and fleeing in order to win skirmishes in real timestrategy games. We compactly represented micro behaviors

as a combination of influence maps, potential fields, andreactive control parameters in a 60 length bit-string. Earlywork showed that evolutionary algorithms perform better thanother heuristic search algorithms[4]. Here, we test the microperformances of our evolved ECSLBot against two state of theart bots, UAlbertaBot and Nova on several skirmish scenariosin StarCraft. We designed eight testing scenarios in which botsneed to control a number of Vultures against different types ofenemies, to evaluate micro performance. The results show thatour genetic algorithm quickly evolves good micro for handlingmelee attack units and ranged attack units.

ECSLBot performs well by switching parameter valuesdepending on the currently targeted unit. Simple parameterswitching can be done in real-time and ECSLBot thus achievesgood micro performance. The results also indicate that Novais highly effective at kiting against melee attack units butperforms poorly against ranged attack units. UAlbertaBot,the AIIDE 2013 champion, performs poorly against meleeattack units but is excellent against ranged attack units in ourscenarios. Compared to the UAlbertaBot, we generate unitspecific micro behaviors instead of a common logic for allunits. With the right parameters, our ECSLBot beats bothUAlbertaBot and Nova.

Our representation leads to good generalization over differ-ent numbers of units, different initial positions, and differentterrain obstacles. However, evolving against a specific AI(SCAI) means that ECSLBot performs well against SCAI,but not as well against other AIs. We plan to investigate co-evolutionary approaches while using the same representationin order to evolve robust micro performance. Moreover, theuse of two fitness functions in our research to evolve microbehaviors against melee and ranged enemy units may leads toextra experiments for game developers.

We are also interested in micro management with mixedunit types instead of a single type of unit, the Vulture - onesimple approach is to evolve or co-evolve parameter sets foreach unit type in an RTS game. This can be done off-line aswe have shown. In addition, we want to integrate the usageof unit abilities (abilities are different from weapons) like theTerran Ghosts EMP pulse, into our micro behaviors.

ACKNOWLEDGMENT

This material is based in part upon work supported by theNational Science Foundation under grant number IIA-1301726and by the Office of Naval Research grants N00014-12-1-0860and N00014-12-C-0522.

REFERENCES

[1] S. Ontanon, G. Synnaeve, A. Uriarte, F. Richoux, D. Churchill,M. Preuss et al., “A survey of real-time strategy game AI researchand competition in StarCraft,” IEEE Transactions on ComputationalIntelligence and AI in games, vol. 5, no. 4, pp. 1–19, 2013.

[2] K. Rogers and A. Skabar, “A micromanagement task allocation systemfor real-time strategy games,” Computational Intelligence and AI inGames, IEEE Transactions on, vol. 6, no. 1, pp. 67–77, March 2014.

[3] C. Miles and S. J. Louis, “Co-evolving real-time strategy game playinginfluence map trees with genetic algorithms,” in Proceedings of theInternational Congress on Evolutionary Computation, Portland, Oregon,2006, pp. 0–999.

Page 13: IEEE TRANSACTIONS ON COMPUTATIONAL …simingl/Dissertation/Evolving Effective...Macro and micro management are two ... evolved in one scenario work well in other scenarios and that

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, SPECIAL ISSUE ON REAL-TIME STRATEGY GAMES, JUNE 2016 13

[4] S. Liu, S. J. Louis, and M. Nicolescu, “Comparing heuristic search meth-ods for finding effective group behaviors in RTS game,” in EvolutionaryComputation (CEC), 2013 IEEE Congress on. IEEE, 2013, pp. 1371–1378.

[5] D. Churchill and M. Buro, “Build order optimization in StarCraft,”in Artificial Intelligence and Interactive Digital Entertainment (AIIDE2011), 10/2011 2011.

[6] A. Uriarte and S. Ontanon, “Kiting in RTS games using influence maps,”in Eighth Artificial Intelligence and Interactive Digital EntertainmentConference, 2012.

[7] M. Buckland and M. Collins, AI techniques for game programming.Premier press, 2002.

[8] S. Rabin, AI Game Programming Wisdom 4. Cengage Learning Trade,2014, vol. 4.

[9] C. Ballinger and S. Louis, “Learning robust build-orders from previousopponents with coevolution,” in Computational Intelligence in Games(CIG), 2014 IEEE Conference on. IEEE, 2014.

[10] S. Ontanon, K. Mishra, N. Sugandh, and A. Ram, “Learning fromdemonstration and case-based planning for real-time strategy games,”in Soft Computing Applications in Industry. Springer, 2008, pp. 293–310.

[11] B. Weber and M. Mateas, “A data mining approach to strategy predic-tion,” in Computational Intelligence and Games, 2009. CIG 2009. IEEESymposium on, sept. 2009, pp. 140 –147.

[12] D. Churchill, A. Saffidine, and M. Buro, “Fast heuristic search for rtsgame combat scenarios.” in AIIDE, 2012.

[13] C. Miles, J. Quiroz, R. Leigh, and S. Louis, “Co-evolving influencemap tree based strategy game players,” in Computational Intelligenceand Games, 2007. CIG 2007. IEEE Symposium on, april 2007, pp. 88–95.

[14] P. Sweetser and J. Wiles, “Combining influence maps and cellularautomata for reactive game agents,” Intelligent Data Engineering andAutomated Learning-IDEAL 2005, pp. 209–215, 2005.

[15] M. Bergsma and P. Spronck, “Adaptive spatial reasoning for turn-basedstrategy games,” Proceedings of AIIDE, 2008.

[16] S.-H. Jang and S.-B. Cho, “Evolving neural NPCs with layered influencemap in the real-time simulation game Conqueror,” in ComputationalIntelligence and Games, 2008. CIG ’08. IEEE Symposium On, dec. 2008,pp. 385 –388.

[17] P. Avery and S. Louis, “Coevolving influence maps for spatial teamtactics in a RTS game,” in Proceedings of the 12th annual conferenceon Genetic and evolutionary computation, ser. GECCO ’10. NewYork, NY, USA: ACM, 2010, pp. 783–790. [Online]. Available:http://doi.acm.org/10.1145/1830483.1830621

[18] M. Preuss, N. Beume, H. Danielsiek, T. Hein, B. Naujoks, N. Piatkowski,R. Stuer, A. Thom, and S. Wessing, “Towards intelligent team com-position and maneuvering in real-time strategy games,” ComputationalIntelligence and AI in Games, IEEE Transactions on, vol. 2, no. 2, pp.82–98, 2010.

[19] H. Danielsiek, R. Stuer, A. Thom, N. Beume, B. Naujoks, and M. Preuss,“Intelligent moving of groups in real-time strategy games,” in Compu-tational Intelligence and Games, 2008. CIG’08. IEEE Symposium On.IEEE, 2008, pp. 71–78.

[20] E. Raboin, U. Kuter, D. Nau, and S. Gupta, “Adversarial planningfor multi-agent pursuit-evasion games in partially observable euclideanspace,” in Eighth Artificial Intelligence and Interactive Digital Enter-tainment Conference, 2012.

[21] J. Borenstein and Y. Koren, “The vector field histogram-fast obstacleavoidance for mobile robots,” Robotics and Automation, IEEE Transac-tions on, vol. 7, no. 3, pp. 278–288, 1991.

[22] O. Khatib, “Real-Time obstacle avoidance for manipulators and mobilerobots,” The international journal of robotics research, vol. 5, no. 1, pp.90–98, 1986.

[23] R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus and coop-eration in networked multi-agent systems,” Proceedings of the IEEE,vol. 95, no. 1, pp. 215–233, 2007.

[24] M. Egerstedt and X. Hu, “Formation constrained multi-agent control,”Robotics and Automation, IEEE Transactions on, vol. 17, no. 6, pp.947–951, 2001.

[25] C. Reynolds, “Flocks, herds and schools: A distributed behavioralmodel,” in ACM SIGGRAPH Computer Graphics, vol. 21, no. 4. ACM,1987, pp. 25–34.

[26] J. Hagelback and S. J. Johansson, “Using multi-agent potential fieldsin real-time strategy games,” in Proceedings of the 7th internationaljoint conference on Autonomous agents and multiagent systems -Volume 2, ser. AAMAS ’08. Richland, SC: International Foundation

for Autonomous Agents and Multiagent Systems, 2008, pp. 631–638.[Online]. Available: http://dl.acm.org/citation.cfm?id=1402298.1402312

[27] M. Buro, “ORTS - A free software RTS game engine,” Accessed March,vol. 20, p. 2007, 2007.

[28] J. Hagelback and S. J. Johansson, “The rise of potential fields in realtime strategy bots,” Proceedings of Artificial Intelligence and InteractiveDigital Entertainment (AIIDE), 2008.

[29] S. Liu, S. J. Louis, and M. Nicolescu, “Using CIGAR for findingeffective group behaviors in RTS game,” in Computational Intelligencein Games (CIG), 2013 IEEE Conference on. IEEE, 2013, pp. 1–8.

[30] M. J. Gunnerud, “A CBR/RL system for learning micromanagement inreal-time strategy games,” p. 105, 2009.

[31] S. Wender and I. Watson, “Applying reinforcement learning to smallscale combat in the real-time strategy game StarCraft: Broodwar,” inComputational Intelligence and Games (CIG), 2012 IEEE Conferenceon. IEEE, 2012, pp. 402–408.

[32] M. Buro, “Real-time strategy games: A new AI research challenge,”Proceedings of the 18th International Joint Conference on ArtificialIntelligence. International Joint Conferences on Artificial Intelligence,pp. 1534–1535, 2003.

[33] J. H. Holland, “Adaptation in natural and artificial systems, universityof michigan press,” Ann Arbor, MI, vol. 1, no. 97, p. 5, 1975.

[34] L. J. Eshelman, “The CHC adaptive search algorithm : How to havesafe search when engaging in nontraditional genetic recombination,”Foundations of Genetic Algorithms, pp. 265–283, 1991. [Online].Available: http://ci.nii.ac.jp/naid/10000024547/en/

Siming Liu (S’13) is currently working towardshis Ph.D. degree in the Evolutionary ComputingSystem Laboratory, Department of Computer Sci-ence and Engineering at the University of Nevada,Reno, NV, USA. He is working on using evolu-tionary computing techniques for generating highperformance micro management in real-time strategygames. His research interests include computationalintelligence and machine learning, with a focus onapplications in computer games. You can visit hiswebsite, http://www.cse.unr.edu/∼simingl/ for more.

Sushil J. Louis (M’01) is a full professor of Com-puter Science at the University of Nevada, Renoand director of the evolutionary computing systemslaboratory. He works in the areas of evolutionarycomputation and its applications in search, optimiza-tion, and machine learning. His current interests arein evolutionary approaches to computer game AIand human behavior modeling. Major applicationareas include computer game AI and human com-puter interfaces. Dr. Louis is an associate editor ofthe IEEE transactions on Computational Intelligence

and Artificial Intelligence in Games and was co-general chair of the 2006IEEE Symposium on Computational Intelligence and Games. You can visithis website, http://www.cse.unr.edu/∼sushil/ for more.

Christopher A. Ballinger (M’12) is currently work-ing towards his Ph.D. degree in the EvolutionaryComputing System Laboratory, Department of Com-puter Science and Engineering at the University ofNevada, Reno, NV, USA. He is working on usingcoevolution techniques for finding strong, robustbuild-orders in real-time strategy games. His re-search interests include computational intelligenceand machine learning, with a focus on applica-tions in computer games. You can visit his website,http://www.cse.unr.edu/∼caballinger/ for more.