multi agent game of kabaddi: strategies and simulation

MULTI AGENT GAME OF KABADDI: STRATEGIES ANDSIMULATION

Thesis submitted in partial fulfillmentof the requirements for the degree of

Master of Science(by Research)in

Computer Science and Engineering

by

PASILA SAMRAT NAGARJUNA200802028

[email protected]

International Institute of Information TechnologyHyderabad - 500032, INDIA

May 2015

Copyright c© Samrat Nagarjuna, 2015

All Rights Reserved

International Institute of Information TechnologyHyderabad, India

CERTIFICATE

It is certified that the work contained in this thesis, titled “Multi Agent, Game of Kabaddi : Strategiesand Simulation ” by Pasila Samrat Nagarjuna(200802028), has been carried out under my supervisionand is not submitted elsewhere for a degree.

Date Adviser: Prof. Kamalakar Karlapalem

To my parents

Sreenivasulu and Sreedevi

and my guide

Prof. Kamalakar Karlapalem

Acknowledgments

This thesis would not have been possible without the help and support of many people. My first andforemost thanks, to my advisor, mentor and guide: Dr. Kamal Karlapalem. His invaluable guidance,constant support and encouragement helped me to complete my thesis successfully. I thank him forthe enormous patience that he has shown in understanding not only my research problems but also mypersonal life and helped me in crossing hurdles.

I also take this opportunity to thank Garima Agrawal, without whom i would not have progressed inmy research on Robots. Her constant effort to solve technical and mechanical issues on robots throughout the end has helped me to focus on the software for robots. I also like to thank Parag Agrawal who hashelped me solving a few programming challenges i faced throughout the course of my project. Finally,I thank T.J David for his help in solving mechanical problems related to robots.

I would also like to express my gratitude and appreciation towards the most important people in mylife, my family, for the support they provided me through my entire life, without whose love, encour-agement I would not have finished this thesis. I am indebted to them forever.

In conclusion, I recognize that this research would not have been possible without IIIT - Hyderabadfor providing the infrastructure and opportunity.

v

Abstract

Kabaddi is a popular and an intense game played in Indian sub-continent. The key feature of thegame is simultaneous embodiment of cooperation and competition among the players. In particular, theplayer has to implicitly or explicitly cooperate and coordinate with in a team, while competing withopponent team. The game is played on the principle - capture the raider, while not getting caught bythe raider. This principle of capture sounds similar to pursuit domain, a domain used in study of multiagent systems. There were many strategies that were proposed for pursuit domain but none of themare suitable for Kabaddi due to the game rules and constraints on field space and time. The rules andregulations of the game makes it more interesting and challenging to develop strategies for a game. Themajor challenges involved in the game of Kabaddi are cooperation between players, decision makingby a team (which strategy to use and when), competing against the opponent team. Thus Kabaddi canprovide an environment and a framework for Multi-Robot Systems and MAS researchers to evaluatestrategies involving both cooperation and competition.

In this thesis, we present a Kabaddi game simulator which assists the user to program strategiesand evaluate them on simulated games. The Kabaddi simulator package (KSP) is built in Java and canbe imported as a Java library. All the game rules of Kabaddi are incorporated into the simulator. Byobserving many international games, we analyzed strategies deployed by a team in various scenariosand came up with a generic solution for the game. Formations play a key role in the anti-raiders game.We introduce the concept of risk that assists the anti-raiders to get into a formation through implicitcooperation. We also use the risk factor to avoid collision among agents. Three strategies straightline formation, semi-circular formation, circular formation are developed, tested and evaluated againstpursuit domain strategy. Though the intermediate goal of anti-raiders is to capture the raider they alsoshould compete in a game. That is, the team might need to change their strategy depending upon thestate of a game and must seek to increase their score. To develop a strategy that is competitive we useMDP (Markov Decision Process) which considers multiple strategies as actions and deploys them basedon the state of a game. This thesis also discusses on building robots that can play Kabaddi and a fewsimulations of Kabaddi strategies on a robot simulator.

vi

Contents

Chapter Page

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Why Kabaddi? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Game Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Game Rules and Regulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Mulit Agent Kabaddi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Thesis contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.6 Overview of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Strategies for Pursuit Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Korf simple solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.2 Game Theoretic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.3 Recursive Modeling Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.4 David Gale solution for lion and man game . . . . . . . . . . . . . . . . . . . 6

2.2 Simulation of Multi Agent Kabaddi . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Robot Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Simulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.1 Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1.1 Environment of Kabaddi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.1.1.1 Game Domin Parameters . . . . . . . . . . . . . . . . . . . . . . . 103.1.1.2 Kabaddi Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1.1.3 Behavior of Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.1.1.4 Capabilities of Agent . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 Kabaddi simulator package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2.1 Episode and Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2.2 Simulator view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.3 Working with Simulator Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.3.1 Starting the Kabaddi Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . 143.3.2 Implementing The Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.4 Architecture of Kabaddi Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

vii

viii CONTENTS

4 Kabaddi Team Strategy Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.1 Game Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.1.1 Anti-raiders perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.1.1.1 Using formations in a strategy . . . . . . . . . . . . . . . . . . . . . 19

4.1.1.1.1 Straight line formation . . . . . . . . . . . . . . . . . . . 194.1.1.1.2 Semi Circular formation . . . . . . . . . . . . . . . . . . 194.1.1.1.3 Circular formation . . . . . . . . . . . . . . . . . . . . . . 20

4.2 Raiders perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.3 Implementing the Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3.1 Risk Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.4 Anti-raider’s Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.4.0.1 Defensive Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.5 Simulation of Game strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.5.1 Korf Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.5.2 Straight-line Formation Strategy . . . . . . . . . . . . . . . . . . . . . . . . . 334.5.3 Semi-circular Formation Strategy . . . . . . . . . . . . . . . . . . . . . . . . 334.5.4 Circular Formation Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Markov Decision Process for Kabaddi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.1 Markov Decision Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2 MDP Model for Kabaddi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.2.1 States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2.2 Rewards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2.3 Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.2.4 Transition Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.3 Results of MDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6 Robo Kabaddi Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.1 Kabaddi Robots - Level I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.1.1 Building Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.1.2 Coordinator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.1.3 Movement of Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.1.4 Messaging Protocol and Action Commands . . . . . . . . . . . . . . . . . . . 476.1.5 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.1.6 Hardware Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.2 Kabaddi Robots - level II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.2.1 Kabaddi Robot Game Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.2.1.1 Coordinator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496.2.1.2 Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.2.2 Virtual Localization through communication . . . . . . . . . . . . . . . . . . 516.2.2.1 XBee API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.2.2.2 Messaging Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 526.2.2.3 Game play . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6.2.3 Hardware Challenges with Level II Robots . . . . . . . . . . . . . . . . . . . 536.3 Robot simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.3.1 Korf Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

CONTENTS ix

6.3.2 Basic Robot behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566.3.3 Straight Line Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.3.4 Semi Circular Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.3.5 Circular Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

List of Figures

Figure Page

3.1 Kabaddi Game Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Architecture of Kabaddi Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1 Different types formations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Graph plot for risk function in equation 4.2 . . . . . . . . . . . . . . . . . . . . . . . 224.3 Simulation of agents with risk function in equation 4.2 . . . . . . . . . . . . . . . . . 234.4 Graph plot for risk function in equation 4.5 . . . . . . . . . . . . . . . . . . . . . . . 244.5 Simulation of agents with risk function in equation 4.5 . . . . . . . . . . . . . . . . . 244.6 Graph plot for risk function in equation 4.10 . . . . . . . . . . . . . . . . . . . . . . . 264.7 Simulation of agents with risk function in equation 4.10 . . . . . . . . . . . . . . . . . 274.8 Graph plot for risk function in equation 4.14 . . . . . . . . . . . . . . . . . . . . . . . 284.9 Simulation of agents with risk function in equation 4.14 . . . . . . . . . . . . . . . . . 294.10 Template for Simulation path plot graph . . . . . . . . . . . . . . . . . . . . . . . . . 314.11 Korf Aggressive Strategy simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.12 Korf Defensive Strategy simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.13 Straight line Aggressive Strategy simulation . . . . . . . . . . . . . . . . . . . . . . . 334.14 Straight line Defensive Strategy simulation . . . . . . . . . . . . . . . . . . . . . . . 344.15 Semi circular Aggressive Strategy simulation . . . . . . . . . . . . . . . . . . . . . . 354.16 Semi circular Defensive Strategy simulation . . . . . . . . . . . . . . . . . . . . . . . 354.17 Circular Aggressive Strategy simulation . . . . . . . . . . . . . . . . . . . . . . . . . 364.18 Circular Defensive Strategy simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.1 Transformation of simulator logs to transition matrix . . . . . . . . . . . . . . . . . . 41

6.1 Robot Build to play Kabaddi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.2 Circuit Design of the robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466.3 Level II Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.4 Robot Kabaddi Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496.5 Game Play and Virtual Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.6 Game play on Raspberry Pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.7 Korf Aggressive strategy simulations in simbad . . . . . . . . . . . . . . . . . . . . . 576.8 Korf Aggressive strategy raid simulation in simbad . . . . . . . . . . . . . . . . . . . 586.9 Simulation of Risk due to border and Middle line . . . . . . . . . . . . . . . . . . . . 596.10 Straight line formation strategy simulation in simbad . . . . . . . . . . . . . . . . . . 606.11 Straight line Defensive strategy simulation in simbad . . . . . . . . . . . . . . . . . . 61

x

LIST OF FIGURES xi

6.12 Straight line Aggressive strategy simulation in simbad . . . . . . . . . . . . . . . . . . 616.13 Semi circular formation strategy simulation in simbad . . . . . . . . . . . . . . . . . . 636.14 Semi Circular Defensive strategy simulation in simbad . . . . . . . . . . . . . . . . . 646.15 Semi Circular Aggressive strategy simulation in simbad . . . . . . . . . . . . . . . . . 646.16 Circular formation strategy simulation in simbad . . . . . . . . . . . . . . . . . . . . 66

List of Tables

Table Page

3.1 A 3x3 Grid cell in Kabaddi World . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.1 Strategy table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.1 Reward table with reward function R(s) = 8× sx2× sy . . . . . . . . . . . . . . . . 395.2 Format of Transistion matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.3 Representation of Transition matrix Π built using simulator logs . . . . . . . . . . . . 415.4 V matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.5 Action matrix Π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.6 Action matrix Π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.7 Game simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.1 Initialization Data message format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.2 Player Data message format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.3 Game Data message format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

xii

Chapter 1

Introduction

Multi agent systems (MAS) aim to provide principles for construction of complex systems involv-ing multiple agents [27]. An agent here is considered to be an entity with goals, actions and domainknowledge. Cooperation and competition of multiple agents in an environment aids in solving prob-lems which are difficult or almost impossible for an individual agent. Agents need to cooperate andcompete in many situations. Having the same agent to cooperate and compete is a challenging task asthe requirements differ. To achieve cooperation, sequence of actions, communication among agents anddecision making is essential. Developing strategies that involve these features is a challenge in multiagent systems.

The game Kabaddi is a South Asian game, which consists of two teams occupying the oppositehalves of a rectangular field [29]. Each team takes turns in sending a raider to the other half consistingof anti-raiders who try to capture the raider. All the players are bound by the rules of the Kabaddi [1].To score over an opponent team, players have to cooperate among themselves while competing andhence developing strategies for such environment will help us in understanding challenges involved ina competitive multi agent system.

1.1 Why Kabaddi?

Though Kabaddi can be classified as a pursuit domain or predator prey problem, the rules of Kabaddifurther complicates the problem. The condition to capture, boundary condition and the prerogative ofraider to retreat to its arena at will makes it difficult for anti-raiders to capture the raider. Withoutstrategy and cooperation among anti-raiders, the probability that an anti-raider gets caught are high.Thus compared to Pursuit games, Kabaddi requires complex strategies to counter a raider.

Kabaddi provides a Multi agent environment where agents have to cooperate and compete to win agame. In contrast to pursuit domain, every agent in Kabaddi has to sense their team agents, raider andfield lines (mid line, boundaries, etc.,) which are significantly important in a game. Also, anti-raideragents have to cooperate among themselves to capture the raider while following the rules of Kabaddi.Though the intermediate goal of anti-raiders is to capture the raider, their ultimate goal is to score more

1

points over the opponent team. This enforces a team to decide upon a strategy depending on the currentstate of a game. Thus developing strategies for Kabaddi will help us understand challenges involved incooperation, competition and decision making which play a major role in many real life scenarios liketeam sports, traffic control or crowd control.

1.2 Game Terminology

This section introduces to the game terminology that will be used throughout this thesis. An exampleof a Kabaddi field can be seen in the Fig 3.1.

Play Field The play field is a portion of ground where the game takes place (ABCD).

Boundary The four lines surrounding the play field are called boundaries (AB, BC, CD and DA).

Mid Line The line that divides the play field into two halves.

Court Each half of the play field divided by the mid line is known as a Court.

Chant The repeated, without break; at a stretch and clear aloud sounding of the approved word KABADDIwithin the course of one respiration shall be called Chant.

Raider One who enters into the court of the opponent with the chant is known as a RAIDER. The raidermust begin his chant before he touches the opponent’s court.

Anti-Raider Every player in whose court the raid is being made shall be called Anti Raider.

Touch If the raider touches an anti or anti-raider by any part of his body or even the clothing, shoes orany other outfit, it is called a touch.

Struggle When the anti-raider comes into contact with the raider, it is called struggle.

Raid When the raider enters into the opponent’s court with the chant, it is known as Raid.

Successful Raid When the raider touches at least one anti-raider and reaches back to his court with thechant then it is known as a successful raid.

1.3 Game Rules and Regulations

In the international team version of Kabaddi, two teams of seven members each occupy oppositehalves of a field of 10 m x 13 m in case of men and 8 m x 12 m in case of women. Each team takesturns by sending a raider and the anti-raiders try to stop the raider from returning back to his half. Theraider is not allowed to take a breath while raiding. He keeps on chanting the words Kabaddi Kabaddi...” to indicate that he is holding his breath [1]. The raider has to return back to his court before taking

2

a breath. A raider is sent off the field if he takes a breath before returning or crosses the boundary ofthe Kabaddi field or a part of raiders body touches the ground outside the boundary except in a strugglewith an anti-raider. If the raider makes a successful raid, then all the anti-raiders touched by the raiderare sent off the field and an equal number of players in the raider’s team are made active if they are outof the game. Each time a player is sent off, the opponent team earns a point. To suit the MAS aspect ofthe game a few game rules are relaxed and will be discussed in the Sec 3.1.

1.4 Mulit Agent Kabaddi

We consider the world of Kabaddi as a rectangular grid model and each agent is modelled as anautonomous agent (agents that behave similarly when placed in similar environment). Every agent de-ploys the same strategy and only differs in the role they play (raider or anti-raider) and input information(about their team players, raider and environment) that they receive. Playing aggressive and defensiveand being able to switch between them is an important aspect of the game. Anti-raider agents alsorequire cooperation to capture the raider while obeying the rules of Kabaddi. After analyzing manyinternational games, we notice that formations play a vital role in anti-raiders game. Depending on thestate of a game the formations are altered. To achieve formations we introduce a concept of risk anddefine risk functions that assist agents to get into formation, avoid collisions and also follow the rulesof Kabaddi (boundary conditions, capture condition etc..,). We formulate game strategies using riskfunctions. Risk functions and their impact on behaviour of agents is discussed in section 4.3.

To account for the fact that there can be innumerable number of strategies that can be developedand each having their own pros and cons, we developed an MDP model for Kabaddi(Sec 5.1) whichconsiders game strategies as actions and decides the best action(strategy) depending on state of thegame. Each strategy is played in different game scenarios (with different number of anti-raiders, againstdifferent raider strategies) and the outcomes are recorded. These records are parsed to build a transitionmatrix for MDP. MDP proves to be dynamic and provides the benefit of extending it with any numberof game strategies. It is never certain that a raider will play with the same strategy. It can bring in newstrategies or modify ian existing strategy. To adapt to these changes, MDP has to executed at regularintervals to find the optimal strategy. These intervals can be decided based on various criteria like thewin/loss percentage in a game or after a certain number of raids or after the team loses a point.

1.5 Thesis contribution

The main contributions of the thesis are

• Designing and implementing Kabaddi Simulator Package (Java library) which can be importedby a user to develop strategies for Kabaddi.

3

• Introduce concept of risk to build game strategies which involve implicit cooperation and forma-tion tactic.

• Model the game of Kabaddi using MDP and develop a generic Team strategy which can be ex-tended with any number of game strategies.

• Developed Level I Robots that can simulate Kabaddi game using Xbee wireless modules forcommunication and integrated these robots into the simulator to simultaneously simulate the gameon robots.

• Developed API’s (XBee, Motor and Message API) for Raspberry Pi (mini sized computer) at-tached to level II Robots which are capable of performing complex computations and thus makingit a potential robot to play Kabaddi.

1.6 Overview of Thesis

The organization of thesis is as follows: In chapter 2, we discuss on related work done in Pursuitdomain relevant to the game of Kabaddi. We also talk about various multi agent simulators which assistgame simulations. In chapter 3, basic terminology and background information about the game areprovided. We also introduce Kabaddi simulator package and explain its working to develop strategiesfor Kabaddi. In chapter 4, we analyze the game from anti-raiders perspective and develop Kabaddigame strategies using the concept of risk. We also discuss the significance of risk function. In chapter5, we develop a generic Kabaddi team strategy using MDP which incorporate multiple strategies. Inchapter 6, we discuss on level I robot built to simulate Kabaddi and also discuss on API’s built for levelII robots which are potential Kabaddi playing robots. The last chapter summarizes the conclusions andfuture work.

4

Chapter 2

Related Work

The game of Kabaddi as seen from the anti-raiders perspective looks slightly similar to the PursuitDomain problem. Kabaddi is an intense game played with rules and restrictions unlike Pursuit Domain.The Pursuit domain or Predator-prey problem is appropriate for the illustration of multi-agent systems(MAS) and has been referenced many times [27]. It has an environment that is studied with four preda-tors and a prey. The goal of the predators is to capture the prey or surround it so that it cannot move to anunoccupied position, while the prey moves randomly [18]. Many other pursuit games become populardomains for the study of cooperative behavior in distributed artificial intelligence (DAI). Some of themare lion and man game [24], cops and robbers game [2] [10], Pursuit evasion [5] [25].

2.1 Strategies for Pursuit Domain

2.1.1 Korf simple solution

A standard solution for pursuit domain problems was suggested by Korf [15] using the concept ofattractive and repelling forces. In the simplest approach to the problem, each predator measures thedistance to the prey from its current position, and each empty neighboring position, and moves to thecell that minimizes its distance to the prey. It its current position is closest to the prey, it remainsstationery. Conversely, the prey moves to an available neighboring cell, or remains stationary, so that itsdistance from the nearest predator is maximized. The author mentions that depending upon the natureof the grid (orthogonal, hexagonal or rectangular) the distance between the adjacent cells varies andhence the distance function has to be varied to avoid blocking of adjacent cells surrounding the prey. Amax norm distance function is used to fix this issue. Finally to reduce the number of moves required tocapture the prey the author introduces a repulsive force between the predators that makes them spreadbefore they surround the prey. This repulsive force between the predators is achieved by subtractingconstant weight times the distance to the nearest predator from the distance function of the predator.

5

2.1.2 Game Theoretic Approach

Levy and Rosenschein proposed a game theoretical approach to pursuit domain [17]. The agentschoose a strategy based on a payment policy which considers implicit coordination among the agents.The payment policy that the author suggests incorporates two ideas. The first was to reach the final asquick as possible i.e. to capture the prey. For every move the predator is paid an amount of moneyproportional to the difference of his distance (measured in city block metrics) from prey before and afterthe move. This argument of the pay off function drives the predator to prefer moves that bring closerto the prey. The second argument of the payoff function encourages the agents to coordinate amongthemselves and block the prey. If Kt denotes the number of directions that are blocked for the prey thenthe second argument is 5 raised to power ofKt. The pay off value for a particular strategy T is calculatedas the sum of differences in players distance from the prey denoted as sigma di and the second argumentof the pay off function as explained above. Here strategy T contains the move that each predator has totake. Now for all possible strategies the value of payoff function is calculated and the one with highervalue is chosen by the predators. We clearly notice that this approach is centralized, as there has to bea master agent who does the calculation of the payoff function for all the possible strategies and thenintimate predators with corresponding move that leads to capture the prey. In contrast to the payofffunction our strategy uses risk function which is same for all the agents but is calculated with differentinput parameters. Every agent can make its own calculations without requiring a master agent to drivethem.

2.1.3 Recursive Modeling Method

Vidal and Durfee [1995] also analyze such situation using the Recursive Modeling Method (RMM)[28]. The basic idea is that a rational agent that can be impacted by other agents will benefit if it canpredict, with some degree of reliability, the likely actions of other agents. With such predictions, therational agent can choose its own actions more judiciously, and can maximize its expected payoff in aninteraction (or over a set of interactions). To get such predictions, an agent could rely on communicationfor explicit disclosure by other agents, or on superficial patterns of actions taken by others, or on deepermodels of other agents state. Predator A bases its move on the predicted move of predator B and viceversa. Since the resulting reasoning can recurse indefinitely, it is important for the agents to bind theamount of reasoning they use either in terms of time or in terms of levels of recursion. Vidal and Durfees[1995] Limited Rationality RMM algorithm is designed to take such considerations into account. RMMalgorithm prunes portions of the recursive structure that can be safely ignored (that do not alter an agentschoice of action).

2.1.4 David Gale solution for lion and man game

Lion and man problem is also another game from pursuit domain. The common setting of the gameis the following. A man and a lion are moving within a given area (usually a subset of a plane). The

6

lion wins if he catches the man. The man wins if he can keep escaping for infinite time. The authoranalyzes the game in a discrete time and continuous space setting. A man and a lion are moving withinthe non-negative quadrant of the plane. In each round, first the man moves to any point in Euclideandistance at most 1 from his current position. The lion wins if he moves to the current position of theman. The man wins if he can keep escaping for an infinite number of rounds. The motive of the lionis similar to that of a raider and the intermediate goal of an anti-raider, which in case of raider is tocatch an anti-raider and for an anti-raider is to catch the raider. Let L0 = [x0, y0] and M0 = [x′0, y

′0]

be the initial positions and coordinates of the lion and the man, respectively and M0 6= L0, if eitherx′0 ≥ x0 or y′0 ≥ y0, then the man can escape by always moving a distance 1 horizontally or verticallyaway from the origin. The author proves that in the remaining case when both x′0 < x0and y′0 < y0 theloin can catch the main in a finite number of moves. The solution proposed by Gale does not considerthe boundary constraints i.e. the game is played in the first quadrant of the coordinate system and the+x and +y axes are considered as boundaries, but Kabaddi has 3 boundaries and a mid line which theanti-raiders cannot cross and thus this strategy cannot be applied in all the cases but offers a strategy tocatch an agent in a few special cases.

All the strategies proposed for the Predator-Prey game are inept for Kabaddi as discussed in thesection 1.1. The anti-raider agents have to ensure that they capture the raider before it crosses the midline and thus they need a defensive strategy to play a safe game and attack only when the probabilityof capturing the raider is high. We present a defensive strategy in chapter [Implementing strategy] anddiscuss how it can be used to counter the raiders move.

2.2 Simulation of Multi Agent Kabaddi

Kabaddi was first modeled as an instance of pursuit domain by Kyle and Suri [13]. The gamewas modeled in a discrete time and space environment and the paper offers analysis of winning strate-gies, and explores tradeoffs between maximum movement speed, number of pursuers, and locationconstraints. Although the paper offers interesting theorems and a few winning strategies with limitednumber of anti-raiders, the cooperation between the agents is not considered. The game is modeled ina discrete time and space environment and the players take alternation turns moving in discrete steps.The author defines a catch and capture with the concept of neighborhood (cells adjacent to the raider).

The following are a few theorems that are stated by the author.

1. The attacker can always capture a single defender in a game of Kabaddi in O(n) moves.

2. In the standard model of Kabaddi on a n× n grid, the attacker can capture both the defenders inO(n) worst-case moves.

7

2.3 Robot Simulation

Frej Laursen Wurtz in his thesis [32] discusses about Artificial Intelligence in the Predator/PreyDomain. The Predator/Prey Domain is in this setup implemented as a game of catch where actual robots,in the form of Lego MindStorms, will cooperate on capturing a third party robot, either controlled by ahuman player or by a computer. This experiment was conducted to understand the physical challengesfaced in a pursuit domain.

There are also Robot Kabaddi tournaments conducted [12]. These robots were build to participatein a struggle either to touch an anti-raider and escape or to block and capture a raider robot. But theserobots are operated by humans and does not involve any AI or cooperation and communication betweenrobots. But In our thesis we discuss on simulating autonomous robots playing Kabaddi with strategiesinvolving cooperation and AI to capture a raider bot.

Other related work on robots with AI include billiards playing robots [26]. Robotic billiards playerswere developed. These systems address the task of executing shots on a physical table, but so far haveincorporated little strategic reasoning. They require artificial intelligence to select the ’best’ shot takinginto account the accuracy of the robot, the noise inherent in the domain, the continuous nature of thesearch space, the difficulty of the shot, and the goal of maximizing the chances of winning. This thesisalso discusses on AI for Kabaddi robots using MDP which helps in coming up with better strategiesagainst the opponents.

Related work on Robot playing games include Soccer[6]. A soccer robot is a specialized autonomousrobot and mobile robot that is used to play variants of soccer. The main organised competitions for Robosoccer are RoboCup or FIRA tournaments played each year. In this thesis we lay a few foundations toconducted one such tournament for Kabaddi on simulator as well as on robots.

2.4 Simulators

To test and implement the behavior of predator in several scenarios without having to worry aboutother details of the domain, Pursuit Domain Package (PDP) [14] is developed. Mason is another popularsimulator used for multi agent simulations [16].

ARES (Agent rescue Emergency Simulator) is intended to be another test bed for multi agent system[4]. ARES allows for many different variants of the basic setting, by varying the information the agentshave when starting, the cost of communication and methods for agents to regain energy, and so on.ARES is also used for teaching purposes.

Both PDP and ARES parameterized many pursuit domain parameters that can be changed easily andthus changing the characteristics of the domain. But these packages dont provide the functionality tochange certain parameters to simulate Kabaddi Domain. Our Kabaddi simulator has all the game rules ofKabaddi incorporated into it and also provides an interface to develop both cooperative and competitive

8

strategies. In contrast to Pursuit domain, Kabaddi has a closed environment and the condition to capturea raider differentiates it from Pursuit domain.

Though there are several simulators to simulate multi agent systems, we built our own Kabaddi simu-lator package(KSP) which has an environment set up exclusively for Kabaddi, to promote rapid strategydevelopment for the game without having to implement game rule logistics and tracking environmentdata. We also built KSP to develop Robot Kabaddi simulation, which required more control over thesource code to incorporate communication with robot into the heart of game engine in the simulator.

9

Chapter 3

Background

3.1 Simulation Model

3.1.1 Environment of Kabaddi

The world of Kabaddi is a grid of size Width*Height (Can be varied). Each cell in the grid isconsidered as one unit. The world is represented as a coordinate system with its origin on the top leftcorner of the grid. Each cell in the grid has a distinct position given by its x and y coordinate. The worldis surrounded by 4 edges (walls) and a line in the middle (also referred to as mid-line) acts as a fieldseparator for the competing teams. The field view is shown in the Fig 3.1.

There 3 kinds of agents that effect the environment

1. Raider

2. Anti-raider

3. Coordinator

The coordinator is the referee for the game.

3.1.1.1 Game Domin Parameters

Based on the rules of Kabaddi, the following are Kabaddi game domain parameters. To suit MASaspect of Kabaddi and grid model of the world, a few game rules are relaxed. Holding of breathe by araider is replaced by restricting the raider to consume only a fixed number of steps in a single raid andcapture happens when a fixed number of anti-raiders touch the raider at the same instant of time. Thetime taken by an agent to move one unit i.e into an adjacent cell (step time) is always constant and hencetime is measured in the unit steps.

Agent Diameter Each Agent has a specific diameter.

10

Figure 3.1 Kabaddi Game Simulator

Capture Number The number of anti-raiders required to touch a raider at the same instant of time tocall it a capture.

Visible range All the agents have a visibility range of infinity (i.e, every agent can see all other agentsin the world).

Legal Moves Every agent is only allowed to move into an adjacent cell (horizontal,vertical or diagonal)with respect to its current cell. We assume that the time taken by an agent to move into an adjacentcell is always constant.

Grid-size The rectangular grid can have a variable width and height.

Movement All the agents can move simultaneously in the world.

Episode time Each agent can only play for a fixed duration of time in a raid. This time is called asEpisode time. Episode stands as a substitute for breathe of an agent (i.e. the time the agent canhold its breath in a raid).

Time-out When time taken by a raider on court exceeds its episode time then a time out is said tohappen.

Touch A touch is said to happen if the distance between a raider and anti-raider is less than the diameterof an agent.

3.1.1.2 Kabaddi Agent

A Kabaddi agent can play the role of a raider or an anti-raider. No two agents can occupy the samecell. A raider if captured is sent out of the game. An anti-raider is sent out of a game when touched by

11

1 2 34 R 56 7 8

Table 3.1 A 3x3 Grid cell in Kabaddi World

a raider in a successful raid. An agent sent out is made inactive and can no longer participate in a gameuntil made active. Every agent has a specific diameter and a pre-set episode time.

3.1.1.3 Behavior of Agent

A raider agent tries to touch as many anti-raiders agents inside its episode time while not gettingcaptured. An anti-raider agent tries to capture the raider while cooperating with the other team players.The time taken by an agent to move into an adjacent cell is constant.

3.1.1.4 Capabilities of Agent

An agent is capable of touching and blocking other agents. Every agent can only move to an emptyadjacent cell i.e when the cell is not occupied by any another agent. Consider the table 3.1, if an agentis located in the cell with label‘R’ then it can only move into a cell labelled from 1-8 or can choose toremain in its current cell.

3.2 Kabaddi simulator package

To build and experiment with strategies, a Kabaddi simulator package (KSP) is built (Fig 3.1). Sourcecode and Library can be found in the reference section [20]. The coordinator is the core program of thesimulator that simulates play. The simulation takes place as a series of episodes and cycles. For eachraid the coordinator decides a raiding team and an Anti-raider team. The coordinator calls the functionselectRaider on raider’s team which returns an agent’s identifier that is going play as a raider. Itthen makes this active agent as a raider. The corresponding threads of agents participating in the raidare created. In each cycle the corresponding move function of each agent is called to update its position.An anti-raider agent is sent out of the game if touched by a raider in a successful raid or if there is aborder cross. The changes in the environment are captured by the coordinator and necessary actions aretaken based on the rules explained in the following section.

3.2.1 Episode and Cycle

Each episode of a raid is seen as a series of cycles. In each cycle the positions of the agents areupdated and the coordinator checks for the following conditions.

12

Touch If the Euclidean distance between the corresponding cells of raider and anti-raider is less then theAgent Diameter then the anti-raider is said to be touched. Below is the condition that determinesif there is touch between a raider at position Xr, Yr and an anti-raider at position Xa, Ya. Dag

is the Diameter of an agent and the distance function d(x1, y1, x2, y2) gives euclidean distancebetween points (x1, y1) and (x2, y2).

d(Xa, Ya, Xr, Yr) < Dag

Capture If at least Capture number of anti-raiders touch the raider at the same instant of time then theraider is said to be captured.

Border cross Checks if an agent crosses the boundary of the play field.

Successful raid If a raider has touched at least one anti-raider and returns back to its arena withoutbeing captured within its episode time, then the raid is said to be successful.

Time out If the time taken by the raid exceeds the episode time of the raider.

Is there a winner Checks if a team can be declared as a winner, that is when a team has no activeplayers to raid or if the number of active anti-raiders are less than the Capture number.

An episode or raid is said to end on a successful raid or if the raider gets captured or on a Time-out.The game ends when all the agents of a team are out of the game or if there are not enough agents tocapture the raider. The raider’s team is awarded with points equal to the number of anti-raiders touchedin a successful raid and an equal number of inactive agents from the raider’s team are made active. Araider is made inactive when captured or on a time-out, in which case the opponent team is awardedwith one point.

3.2.2 Simulator view

The simulator view is divided into three parts:

1. Simulation

2. User Controls

3. Game status.

The simulation view displays the grid and the agents in the grid as shown in the Fig 3.1. All theedges(boundaries) and field separator are drawn. Each team has a different color. The color of the anti-raider is changed when touched by a raider. Each raid takes place on the anti-raiders court. The userinterface has the following buttons to control the game .

Start starts the game play.

13

Play starts the next raid.

Reset resets the game.

The game status displays the score and number of active agents in each team. An agent is said to beactive if it can participate in a raid.

3.3 Working with Simulator Package

The Kabaddi simulator package has the following components.

Coordinator This program controls the game play. It Creates agents, starts a raid and waits for awinner.

Controller The controller updates position of agents and checks for events mentioned in Sec 3.2.1 .

Simulator The simulator program displays the current state of a game and also provides an interface tocontrol the game.

Agents The agents class defines all the properties of an agent. It also has member functions to get andset agent parameters.

Engine Engine is an abstract class that partially implements a strategy for a game. The user has toextend this class to implement both raider and anti-raider strategies for his team.

Communication This class implements functions required to send serial data from the simulator througha specified communication port1.

3.3.1 Starting the Kabaddi Simulator

The Kabaddi simulator is the core program of the package that initializes all the simulator compo-nents along with the simulator view components. Each Team implements their strategy by extendingthe Engine and the corresponding instances of strategy class are passed as parameters to the Kabaddisimulator object. After this initialization ,the run function is called on the simulator object to begin thesimulation.

import k a b a d d i S i m u l a t o r ;

Tes tEngineX engineX = new Tes tEngineX ( 4 , ”TeamNameX” ) ;

Tes tEngineY engineY = new Tes tEngineY ( 3 , ”TeamNameY” ) ;

K a b a d d i S i m u l a t o r s i m u l a t o r = new K a b a d d i S i m u l a t o r ( engineX , engineY ) ;

s i m u l a t o r . run ( ) ;

1A socket on a computer is used to connect a modem, data acquisition terminal or other device via a serial interface (onedata bit following the other)

14

Listing 3.1 Starting the simulator

3.3.2 Implementing The Strategy

Implementing the strategy class for a team requires to extend an Engine class as show in the listing3.2. The constructor of this class takes number of agents and team name as parameters. Note that theconstructor of the super class (Engine) must also be called from the constructor of strategy class.

15

import Kabaddi . Agent . Agent ;

import Kabaddi . Engine . Engine ;

import j a v a . awt . geom . Poin t2D ;

import j a v a . awt . geom . Poin t2D . F l o a t ;

p u b l i c c l a s s sampleEng ine ex tends Engine{

p u b l i c sampleEng ine ( i n t noOfAgents , S t r i n g teamName ) {

super ( noOfAgents , teamName ) ;

}

p u b l i c F l o a t moveRaiderAgent ( Agent r a i d e r A g e n t , F l o a t [ ] a n t i R a i d e r P o s i t i o n s , i n t

a n t i R a i d e r S c o r e ) {

}

p u b l i c F l o a t moveAnt iRa iderAgent ( Agent a n t i R a i d e r A g e n t , F l o a t [ ]

a n t i R a i d e r P o s i t i o n s , F l o a t r a i d e r P o s i t i o n , i n t r a i d e r S c o r e ) {

}

p u b l i c F l o a t a n t i R a i d e r I n i t i a l P o s i t i o n ( i n t index , i n t noOfAct ive , i n t

n o O f A c t i v e R a i d e r s ) {

}

p u b l i c F l o a t R a i d e r I n i t i a l P o s i t i o n ( i n t noOfAct ive , i n t n o O f A c t i v e A n t i R a i d e r s ) {

}

}

Listing 3.2 Implementing strategy

The functions moveRaiderAgent, moveAntiRaiderAgent, antiRaiderIntialPosition,

raiderInitialPosition have to be overridden by the extending class. moveRaiderAgent

takes an instance of the raider agent and positions of anti-raiders as parameters and returns the relativeposition that raider agent has to take with respect to its current position. moveAntiRaiderAgenttakes an instance of the anti-raider agent, positions of other anti-raider agents and raider agent as pa-rameters and returns the relative position that the anti-raider has to take with respect to its current

16

Figure 3.2 Architecture of Kabaddi Simulation

position. antiRaiderInitialPosition and raiderInitialPosition returns the corre-sponding initial positions that the agents have to take at the start of a raid.

3.4 Architecture of Kabaddi Simulation

The user imports Kabaddi simulator package on the PC and implements a strategy by extendingthe engine class for two teams. An instance of these strategy classes are passed as a parameter to theKabaddi simulator object. The simulator further passes these instances to the coordinator which controlsthe agents. Simultaneous simulation of Level I robots will be discussed in section 6.1

The coordinator captures changes in the environment and takes corresponding actions. The simula-tion view panel renders the field view periodically. The data panel view is also updated and displayscurrent score and status of each team (see Fig 3.1).

17

Chapter 4

Kabaddi Team Strategy Formulation

4.1 Game Analytics

Kabaddi anti-raider strategies can be classified into two categories.

Aggressive where agents tend to be more attacking i.e. to get closer and touch the raider.

Defensive where agents tend to move away from the raider and only attempt to capture when touched.

Let us now analyze the game from the perspective of raider and anti-raider with respect to the simulation.

4.1.1 Anti-raiders perspective

All the strategies proposed for the Predator-Prey game are inept for Kabaddi. The condition tocapture, boundary condition and the prerogative of raider to retreat to its arena at will makes the gamemore interesting and different from Pursuit games and thus requires an approach different from thePursuit domain games.

Coordination is the key for Anti-raiders game. Even though the final goal of anti-raiders is to capturethe raider, which is similar to the goal of predators in a Predator-Prey game, their intermediate goaldiffers. The anti-raiders have to ensure that they maintain a safe distance to elude a touch by the raider.Decision making is also an important role in the game. The anti-raiders have to collectively decidebetween aggressive or defensive strategies. They should be in a position to switch between strategies atany point of the game. To meet all such requirements the anti-raiders have to coordinate either implicitlyor explicitly and get into a formation, which plays a major role in the anti-raiders game. An Anti-raiderhas to also decide whether to participate in a struggle to capture the raider or to play safe. Sometimesplaying safe is a better option when the probability that the raider makes a successful raid is substantiallyhigh. The number of steps required to get into a formation depends on the initial positions of agents.After a few simulations we notice that a semi-circular formation assists the anti-raiders to change theirformation and strategy with ease. The following section provides more insight on the role of formations.

18

Figure 4.1 Different types formations

4.1.1.1 Using formations in a strategy

As discussed earlier, formation assists anti-raiders to switch between strategies. We now discuss afew formations (Fig 4.1) ,their advantages and disadvantages. The circled agent is a raider and the arrowindicates the direction of attack by the raider.

4.1.1.1.1 Straight line formation The most basic formation would be a straight line formation asshow in Fig 4.1 (a). All the agents form a straight line parallel to mid line and each agent maintainsa minimum distance from the raider and hence avoiding a touch. This formation can be formed in aminimum number of steps when compared to the other formations. The following are the disadvantageswith a straight line formation. Consider the formation in Fig 4.1 (a), notice that the raider is closer toeither one or two anti-raiders and far from other anti-raiders. With this formation it is easy for the raiderto touch the closest anti-raider get away with it while the anti-raiders find it difficult to surround andcapture.

4.1.1.1.2 Semi Circular formation Analyzing many international games of Kabaddi, a Semi Circu-lar formation (show in Fig 4.1 (b)) is considered to be an optimal formation for anti-raiders.

In a semi circular formation all the anti-raiders align themselves in a semi circle while maintaininga safe distance form the raider at every point of the game. The magnitude of this distance depends onthe radius of semi-circle with raider as center. To play defensive, anti-raiders adjusting their positionsdepending on the raider’s position thus ensuring a safe distance from raider. To switch between ag-gressive and defensive, the radius of the semi circle is decreased for an aggressive or increased for adefensive strategy . When the radius of semi-circle is equal to the radius of raider (size of raider), theanti-raiders adjust their positions around the raider without colliding among themselves and at a distance

19

close enough to be called as a touch or capture depending upon then number of anti-raiders attackingthe raider.

In contrast to pursuit domain, often surrounding the raider and playing aggressive is not sufficient tocapture. The raider might touch an anti-raider close to the mid line and retreat to its arena before theanti-raiders can capture . The Anti-raiders have to position themselves in a way that at least capturenumber of anti-raiders can capture the raider before it retreats. A Semi Circular formation assists anti-raiders to surround the raider partially, thereby constructing a favourable state to play aggressive as wellas defensive. Depending on the raider’s position and the direction of attack, a Semi Circular formationendorse anti-raiders to partially surround behind the raider. Thereby providing an opportunity to beaggressive and capture the raider. For Example, if the raider tries to attack an anti-raider to its left, thenthe rest of the anti-raiders surround the raider from the right (Fig 4.1 (b)) or vice-versa.

4.1.1.1.3 Circular formation A Circular formation strategy is an extension to the Semi Circularformation where the agents get into a circular formation by increasing the spread among themselves(Fig 4.1 (c)).

4.2 Raiders perspective

The goal of a raider is to touch as many anti-raiders and return back to its arena while not gettingcaptured. The number of anti-raiders that a raider can touch in an instant should be less than the capturenumber to avoid a capture. As the distance of raider from the mid line increases the probability of beingcaptured also increases. The anti-raiders have more room to surround and attack the raider while theraider has to travel a greater distance to retreat thus presenting more time for the anti-raiders to capture.Sometimes the raider might play safe when their team is in a leading position or when a time-out is aboutto happen. Path planning also comes into play when the raider attempts to retreat back to its arena.. Theraider has to plan a path which is relatively short and has a less probability of being captured.

4.3 Implementing the Strategy

As discussed earlier each agent in the simulator is provided with information about its environment.Each anti-raider is provided with the positions of other anti-raiders and the position of raider while araider is provided with anti-raider positions. A strategy for an agent has to assist an agent to decide onits next move or action. We now develop a strategy for homogeneous agents where each agent has thesame strategy and only differ in the input information they receive. We use the concept of risk to achievea defensive strategy with implicit cooperation among agents.

20

4.3.1 Risk Function

As per our model of simulation, the world of Kabaddi is a rectangular grid with 4 edges as boundaries(Fig 3.1). A cell is an unit component of the grid and is identified by its two dimensional coordinate(x, y) in a coordinate system with origin at the left top of the grid as shown in the Fig 3.1. Each cell isassigned with a risk value which is calculated using a risk function associated to a strategy. The functiontakes in the position of anti-raiders, raider and the position of a cell as input parameters.

Let A denote the set containing the positions of anti-raiders, R denote the position of the raider andC denotes the position of the cell. We define P as a set {A,R,C} which contains all the informationrequired by an agent to make a progress in a game. We now define Risk function RF for a strategy Sas a function RF : P → R.

Lower the value of risk the safer it is for an agent to stay in the corresponding cell. With this defini-tion of risk we formulate our game strategy as follows.

“Each agent tries to minimize its risk at every point of the game”

Each agent calculates the value of risk for all the cells adjacent to its existing cell. It then takes acorresponding action to move into a cell with minimum value of risk. If the risk value in all its adjacentcells is higher compared to its existing cell then it does not move. In the next section we will achieveaggressive and defensive strategies with implicit cooperation using the concept of risk. We also analyzehow a change in the risk function leads to a change in strategy.

4.4 Anti-raider’s Strategy

Playing aggressive is to attack and capture the raider which is similar to pursuit domain where agroup of predators try to capture a prey and thus all the strategies proposed to capture the prey can beused as an aggressive strategy for Kabaddi. A few strategies proposed for pursuit domain are discussedin the related work section. We now switch our focus towards defensive strategies that play a vital rolein the game of Kabaddi. We define each strategy using a risk function and explain its significance usinga 3 dimensional graph. The XY plane of the graph represents the rectangular field of Kabaddi and theZ axis represent the risk values. The colors(VIBGYOR) in the graph depict the amount of risk at eachpoint(Red-High,Voilet-low). All the graphs are plotted on a grid of area 400 x 400 square units.

4.4.0.1 Defensive Strategy

In a defensive mode all the agents either maintain a constant distance or move away from the raider.Consider the following risk function

dr =√

(x− rx)2 + (y − ry)2 (4.1)

21

Figure 4.2 Graph plot for risk function in equation 4.2

RF1 (x, y) = MAX V ALUE − dr (4.2)

Here (x, y) and (rx, ry) denote the position of a cell and raider respectively. MAX V ALUE is themaximum possible distance between any two points on the grid. And dr denotes the distance betweenthe cell (x, y) and the raider at (rx, ry).

The graph in Fig 4.2 plots the risk values on the z-axis considering raider at position (rx, ry). Therisk function in eq.4.2 signifies the risk generated by a raider. The risk values tend to decrease as thedistance from the raider increases and thus the tend agents move away (defensive) from the raider.

We prove mathematically that the equation 4.2 forces anti-raiders to move away from the raider. Wefind the points at which the function attains its global or local extrema by differentiating the risk functionwith respect to x and y and solving the equations ∂RF1

∂x = 0 and ∂RF1∂y = 0 simultaneously.

∂RF1

∂x=

−2(x− rx)√(x− rx)2 + (y − ry)2

= 0 (4.3)

22

Figure 4.3 Simulation of agents with risk function in equation 4.2

∂RF1

∂y=

−2(y − ry)√(x− rx)2 + (y − ry)2

= 0 (4.4)

We see that both the equations are never simultaneously zero. However they both are undefined at(rx, ry). Therefore (rx, ry) is a critical point. Substituting (rx, ry) in the equation 4.2, we get result asMAX V ALUE. Since dr ≥ 0 for all (x, y) thus RF1(rx, ry) ≥ RF1(x, y) as MAX V ALUE >

MAX V ALUE − dr for all (x, y). Hence (rx, ry) is a global maximum of the equation 4.2. Thus allthe anti-raider agents tend to move away from (rx, ry) (raider’s position), thereby minimizing their risk.

We now define a risk function which drives the anti-raiders to maintain a safe distance r from theraider. Notice how the anti-raiders start to move away from the raider as it approaches towards them inFig 4.3.

Now consider the risk functionRF2 (x, y) = |r − dr| (4.5)

The distribution of risk values can be seen in the graph Fig 4.4. The cells which are at a distance rfrom the raider have lower values compared to the rest and thus agents at a distance greater than r reducetheir distance and agents at a distance less than r increase their distance from the raider. Gradually allthe anti-raiders are at a distance r from the raider. With this configuration if the anti-raiders considerablyspread among themselves through coordination and without collision then a semi circular formation canbe attained. Notice how the anti-raiders get closer to the raider as it approaches towards them in step 2of Fig 4.5 and later move away to maintain a constant distance from the raider in step 3. We also noticethat the anti-raiders in the middle (step3) are close to each other compared to the rest and thus are proneto collision.

We now mathematically prove that the anti-raider agents maintain a safe distance from the raiderusing the risk function 4.5. We rewrite the equation 4.5 as the following

23



24

RF2(x, y) =

r − dr if r ≥ drdr − r if r < dr

(4.6)

We now solve for critical points of the equation 4.6 in two domains. First consider domain D1

of points defined as D1 = {(x, y) : dr ≤ r}. Now differentiating RF2(x, y) with respect to x andy in this domain gives us the same equations as eq 4.3 and eq 4.4 nnd hence evaluates to the samecritical point (rx, ry) which belongs to the domain D1. RF2(rx, ry) = r ≥ r − dr, ∀(x, y) ∈ D1, asdr ≥ 0. Thus RF2(rx, ry) ≥ RF2(x, y),∀(x, y) ∈ D1. So (rx, ry) is a local maximum in the domainD1. Since D1 is a closed interval we also need to consider the boundary points for local extremas.Consider the set of points B = {(xb, yb) : r = dr}. All the points in set B are at the boundary andRF2(xb, yb) = 0 ≤ RF (x, y), ∀(x, y) ∈ D1 as RF2(x, y) = r − dr >= 0 in D1. And hence all thepoints in B are local minimas of the risk function 4.5. Hence agents with position (x, y) ∈ D1 tend tomove away from (rx, ry) (local maximum) and move towards the points at the boundary ((xb, yb) ∈ B)where the risk function attains its local minimum and thereby decreasing the risk.

Now consider the domain of points D2 = {(x, y) : r < dr}. Differentiating the risk function withrespect to x and y gives us the equations 4.7 and 4.8 (which are negatives of equations 4.3 and 4.4).

∂RF2

∂x=

2(x− rx)√(x− rx)2 + (y − ry)2

= 0 (4.7)

∂RF2

∂y=

2(y − ry)√(x− rx)2 + (y − ry)2

= 0 (4.8)

Notice that the differentials does not exist at (rx, ry) but the point does not belong to the domain D2

and hence not a critical point. As D2 is an open interval with points belonging to set B as boundarypoints and the value of RF2(x, y) at these points is zero and RF2(x, y) = dr − r > 0,∀(x, y) ∈ D2

thus the function attains its local minimum (zero) at the points close to these boundary. So agents withposition (x, y) ∈ D2 tend move closer to the boundary points in set B and thereby minimizing the risk.Also notice that RF2(x, y) ≥ 0,∀(x, y) and the value of RF2(xb, yb) for all (xb, yb) ∈ B is zero, thusfor all points in set B ,RF2(xb, yb) = 0 ≤ RF2(x, y),∀(x, y). Hence set B contains the points at whichthe function attains its global minimum. Thus all the agents tend to get to the points in setB and therebymaintaining a constant distance r from the raider.

Now to achieve spread among the agents we reuse the concept of risk. Apart from risk due to a raiderwe also introduce risk from a co player. The probability of collision increases as the distance betweenagents decreases. Korf uses the concept of repulsive force between agents to avoid collision when theagents are substantially closer to each other and hence is adequate for aggressive play. To achieve spreadand thereby a semi circular formation, we define the risk function as follows

da =√

(x− ax)2 + (y − ay)2 (4.9)

25


RF3(x, y) =

|r − dr|+ s− da if da ≤ s

|r − dr| else(4.10)

Here da denotes the distance between the cell at position (x, y) and the anti-raider at position (ax, ay)

which is closer to the cell position.The value of spread is denoted by s. Greater the value of s the better the anti-raiders surround the

raider. With this risk function the anti-raiders eventually end up in a semi circular formation which canbe explained with the help of the graph in Fig 4.6. First, we observe that the lower values of risk areat a distance r from the raider and thus each anti-raider eventually attains a position on the semi circlewith raider as the center and radius r. Secondly, each anti-raider adds risk to other anti-raiders closerto it and forces them to maintain atleast a distance of s from it (notice the smaller lobe in the Fig 4.6that was created after introducing s into the equation) and thereby avoiding the collisions and partiallysurrounding the raider. The Fig 4.7 shows how the anti-raiders surround raider when r is set to a valueof 40 units and s set to a value 50 units. Since the world of Kabaddi is surrounded by boundaries thevalues of s and r have to be set such that the anti-raiders have enough room to surround the raider within

26


these boundaries. But as the risk function depends on the raider’s position, anti-raiders might be forcedto cross the boundaries. Notice how the anti-raiders maintain a constant distance from raider in step 2of Fig 4.7 but also spread among themselves in the step 3. But as the raider’s position is closer to theboundary the anti-raiders are pushed towards the boundaries and are prone to cross the boundary.

We now mathematically prove that with risk function 4.10 each anti-raider tends to maintain consid-erable amount of distance from other anti-raiders. If da > s then the anti-raider is already at a distancegreater than s from its nearest anti-raider. In this case the risk function 4.10 is same as risk function 4.5,which has been already proved to maintain constant distance r from the raider. So we now consider thecase in which da ≤ s. In this case the risk function is as follows

RF ′3(x, y) = |r − dr|+ s− da (4.11)

We first split this function into two parts (i) f1(x, y) = |r − dr| and (ii) f2(x, y) = s − da. SoRF ′

3(x, y) = f1(x, y) + f2(x, y). The function f1 is same as the function 4.5 and will drive the anti-raiders to maintain a constant distance r from the raider. We now differentiate the function f2 withrespect to x and y and equate them to zero to find critical points of the function. The differentials are asfollows

∂f2∂x

=−2(x− ax)√

(x− ax)2 + (y − ay)2= 0 (4.12)

∂f2∂y

=−2(y − ay)√

(x− ax)2 + (y − ay)2= 0 (4.13)

Notice that the differentials are similar to the differentials in equation 4.3 and 4.4 with rx and ryreplaced by ax and ay. Hence without loss of generality we can say that the point (ax, ay) is a critical

27


point of the function f2. With the condition da > 0, ∀(x, y) in the respective domain f2(ax, ay) = s >

s − da =⇒ f2(ax, ay) ≥ f2(x, y). And thus proves that (ax, ay) is a local maxima of function f2.Since f2 overlays upon f1 in the risk function 4.10 (producing the smaller lobe in the Fig 4.6) i.e addingto the risk value produced by function f1, we formally say that the anti-raiders move away from its localmaxima (ax, ay) and thereby minimizing risk due to other anti-raiders and hence avoiding collisions.

Finally to handle the boundary condition we introduce high amount of risk from the boundaries intothe risk function. Consider the risk function

RF4(x, y) =

|r − dr|+ |s− da| if (x, y) is not a boundary cell and da ≤ s

|r − dr| if(x, y) is not boundary cell and da > s

MAX V ALUE else

(4.14)

The resultant risk values produced by the above function can be seen in the graph (Fig 4.8). Noticehow the risk values at boundaries increase enormously and thereby ensuring that the anti-raiders do notcross the boundaries and adjust their semi circular formation accordingly. With this risk function we

28


notice that the anti-raiders in the step 2 of Fig 4.9 maintain a safe distance from the raider and spreadamong themselves. In the step 3 notice that the anti-raiders in the middle move away from the raider aswell as from the boundary.

4.5 Simulation of Game strategy

With the concept of risk, a few game strategies were developed along with a naive pursuit domainstrategy. Every strategy has an aggressive and a defensive version. In an aggressive strategy all theagents attack(attempt to capture the raider) after they achieve an intermediate goal (like formation, aconstant distance from the raider, etc.,) in contrast to a defensive strategy where agents attack only afterone of the anti-raider is touched. The table shows all the strategies developed using Kabaddi Simulatorpackage.

The following figures show the simulation for each strategy played with different number of players.Each game is simulated in a 400× 300 grid. The Brown color is the path followed by raider. All agentsare independent and homogeneous i.e they have the same code and only differ in the environment theyperceive. Thus for a given state of a game(score, number of players), strategy and initial position ofagents, the outcome of a game will always be the same.

Notice how the movement of anti-raiders change with a change in strategy and the number of playersplaying the game (shown in round brackets beside the name of the strategy). If the number of pointsearned by winning team is more than 1, then it is explicitly shown at the end of the plot label (insideparentheses). The starting points of all the agents are circled and starting point circle of raider is labelledas r. Other events like touch and capture are also circled along with labels. Mid line should be assumedto be present as a vertical line to the right most part of the plot and close to the start point of the raider.

29

Strategy name Description SectionKorf-aggressive Uses korf strategy for pursuit domain 4.5.1Korf-defensive Inverts the heuristic (to be defensive) used in Korf strategy until

an anti-raider is touched and later reverts to the proposed heuristic4.5.1

Straight-line aggressive Agents form a straight line formation maintaining a constant dis-tance from the raider and attack after achieving the formation

4.5.2

Straight-line defensive Agents form a straight line formation maintaining a constant dis-tance from the raider and attack only after the raider touches oneof the anti-raider

4.5.2

Semi-circular aggressive Agents form a semi circular formation around the the raider whilemaintaining a constant distance and attack after achieving the for-mation

4.5.3

Semi-circular defensive Agents form a Semi circular formation while maintaining a con-stant distance from the raider and attack only after the raidertouches one of the anti-raider

4.5.3

Circular aggressive Agents form a Circular formation by increasing spread around thethe raider while maintaining a constant distance and attack afterachieving the formation

4.5.4

Circular defensive Agents form a Circular formation while maintaining a constantdistance from the raider and attack only after the raider touchesone of the anti-raider

4.5.4

Table 4.1 Strategy table

30

Figure 4.10 Template for Simulation path plot graph

The Fig 4.10 will stand as a template to help you understand the simulation plots that follow. Thebehaviour of anti-raider agents corresponding to each strategy has been discussed in section 4.1. A fewsimulation videos of strategies in action can be found in the reference section [21].

4.5.1 Korf Strategy

Korf strategy fails to follow rules of the game while in pursuit of raider. The Fig 4.11 are a few plotsof the path followed by agents in an aggressive Korf strategy against an aggressive and defensive raiderstrategy. Notice that in Fig 4.11 the anti-raiders slightly spread among themselves when the raider isfar from them but eventually get closer to each other and the raider. Unless initial positions of anti-raiders are close to a semi circular formation and raider plays with a naive defensive strategy, it is lessprobable that anti-raiders can capture the raider. The Fig 4.12 has path plots when anti-raiders playdefensive strategy against an aggressive raider strategy. Due to weak court positioning at the point oftouch, anti-raider agents find it difficult to capture the raider before it crosses the mid line.

31

Figure 4.11 Korf Aggressive Strategy simulation

Figure 4.12 Korf Defensive Strategy simulation

32

Figure 4.13 Straight line Aggressive Strategy simulation

4.5.2 Straight-line Formation Strategy

The Figures 4.13 and 4.14 are plots of path followed by anti-raider agents when playing straightline aggressive and defensive strategy respectively. Notice how the anti-raiders maintain the spread andmove in straight lines. Though the anti-raiders are successful in defending at borders and mid line, theformation is still weak to capture the raider in most of the situations.

4.5.3 Semi-circular Formation Strategy

As discussed in section 4.1, semi circular formation assists the anti-raider to position themselves ina good court position to capture the raider. The Figures 4.15 and 4.16 are the path plots of agents whenanti-raiders play with semi circular aggressive and defensive strategy respectively. In an aggressivestrategy this formation ensures that the anti-raiders partially surround the raider having enough room toadjust their position before attacking. In a defensive strategy as they wait for a touch from the raider and

33

Figure 4.14 Straight line Defensive Strategy simulation

semi circular formation ensures that they place themselves in a position to surround the raider as soonas a touch happens as seen in Fig 4.16.

4.5.4 Circular Formation Strategy

The Figures 4.17 and 4.18 are path plots of agents when anti-raiders play with circular aggressiveand defensive strategy respectively. The circular formation strategy proves to be advantageous whenthe raider plays completely aggressive(attack the closest anti-raider). Due to a high spread among theagents, the anti-raiders surround the raider completely before a touch happens. But they end up in aweak court position if the raider changes its strategy from attacking the closest anti-raider to a strategyof attacking an anti-raider closer to the mid line.

34

Figure 4.15 Semi circular Aggressive Strategy simulation

Figure 4.16 Semi circular Defensive Strategy simulation

35

Figure 4.17 Circular Aggressive Strategy simulation

Figure 4.18 Circular Defensive Strategy simulation

36

Chapter 5

Markov Decision Process for Kabaddi

5.1 Markov Decision Process

MDP provides a mathematical framework for modelling decision making in situations where out-comes are partly random and partly under the control of a decision maker [30]. More precisely, aMarkov Decision Process is a discrete time stochastic control process. At each time step, the process isin some state s, and the decision maker may choose any action a that is available in state s. The processresponds at the next time step by randomly moving into a new state s′, and giving the decision maker acorresponding reward Ra(s, s′).

The probability that the process moves into its new state s′ is influenced by the chosen action. Specif-ically, it is given by the state transition function Pa(s, s′). Thus, the next state s′ depends on the currentstate s and the decision maker’s action a. But given s and a, it is conditionally independent of all previ-ous states and actions; in other words, the state transitions of an MDP possess the Markov property.

A Markov decision process is a 4-tuple (S,A, P (., .), R(., .)), where

• S is a finte set of states

• A is a finite set of actions (alternatively, is the finite set of actions available from state s)

• Pa(s, s′) = Pr(st+1 = s′|s + t = s, at = a) is the probability that action a in state s at time twill lead to state s′ at time t+ 1

• Ra(s, s′) is the immediate reward (or expected immediate reward) received after transition to states′ from state s.

The core problem of MDPs is to find a “policy” for the decision maker: a function π that specifies theaction π(s) that the decision maker will choose when in state s. Suppose we know the state transitionfunction P and the reward function R, we wish to calculate the policy that maximizes the expecteddiscounted reward. The standard family of algorithms to calculate this optimal policy requires storagefor two arrays indexed by state: value V , which contains real values, and policy π which contains

37

actions. At the end of the algorithm, π will contain the solution and V (s) will contain the discountedsum of the rewards to be earned (on average) by following that solution from state s.

The algorithm has the following two kinds of steps, which are repeated in some order for all thestates until no further changes take place. They are defined recursively as follows

π(s) := arg maxa

{∑s′

Pa(s, s′)(Ra(s, s

′) + γV (s′))

}

V (s) :=∑s′

Pπ(s)(s, s′)(Rπ(s)(s, s

′) + γV (s′))

Their order depends on the variant of the algorithm; one can also do them for all states at once orstate by state, and more often to some states than others. As long as no state is permanently excludedfrom either of the steps, the algorithm will eventually arrive at the correct solution.

We now present a procedure on how MDP can effectively be used as a strategy to play in a compet-itive environment like Kabaddi. MDP provides a model to incorporate multiple strategies of the gameinto a single complex strategy. The following section discusses on MDP model for Kabaddi.

5.2 MDP Model for Kabaddi

Consider two teams X and opponent team Y . We now develop a strategy for team X which usesMDP to decide upon its strategy.

5.2.1 States

The main goal of a team in Kabaddi is to score greater than the opponent team by the end of a match.The players decide their team strategy based on their present score. With this understanding we choosethe state of the game s as follows

Let sx and sy represent the scores of Team X and Y respectively and state of the game s is repre-sented as (sx, sy). An example of state is s : (4, 2) which is the state of the game when the score ofteam X is 4 and Y is 2.

Let S contain the set of all possible states in Kabaddi. Set S is represented as follows

S ∈ (x, y) where x and y belong to N and x, y ≤ Mk where Mk denotes the maximum possiblescore that can be achieved in a single game. In our execution we considered the value of Mk to be 4 butin general can be set to any value which is pretty high and which is less probable to achieve.

5.2.2 Rewards

R denotes the reward matrix and R(s) specifies the reward that a team gets when they reach the states. We used the equation below for the reward matrix

38

sx\sy 0 1 2 3 40 0.0 -2.0 -4.0 -6.0 -8.01 8.0 6.0 4.0 2.0 0.02 16.0 14.0 12.0 10.0 8.03 24.0 22.0 20.0 18.0 16.04 32.0 30.0 28.0 26.0 24.0

Table 5.1 Reward table with reward function R(s) = 8× sx2× sy

R(s(sx, sy)) = α× sx − β × sy + γ

To win a game each team tries to maximize their score. Thus a state with a higher score is given ahigher reward. As the difference between the scores increases the reward increases. The rate of increasein reward is controlled by α, β, γ variables which denote the weight given to the anti-raiders score,raider score and offset respectively. The values in the table 5.1are rewards given to states using thereward function as R(s) = 8× sx2× sy(α = 8, β = 2, γ = 0), α is set higher than β as we want teamX to win).

5.2.3 Actions

Let A is finite set of action space containing all the possible strategies that can be played in the gameof Kabaddi and As denote the set of actions available from state s. It is to be noted that if both theteams play defensive then the outcome of a raid is a draw with none scoring a point. Thus to enforce anoutcome we define an aggressive and defensive variant of each strategy and are defined as Follows

Aggressive where agents attack the raider after a certain trigger point (after all the agents get into aformation or when the raider enters into a certain area on the field).

Defensive where agents attack the raider only after a touch.

The following are the set of actions (strategies) that were considered for evaluation As = 0 : korfaggressive, 1 : korf defensive, 2 : straight line aggressive, 3 : straight line defensive, 4 : semi circleaggressive, 5 :semi circle defensive, 6 : circle aggressive, 7 : circle defensive

5.2.4 Transition Matrix

The probability transition matrix is derived from the simulator logs. After each raid the simulatorlogs the results in the following format into a file (gamelogs).

< strategy − name > sxsys′xs

′y.

39

action-indexSt(0, 0) St(0, 1) ... St(0,Ms)

St(1, 0) St(1, 1) ... St(1,Ms)

.

.

.St(Ms, 0) St(Ms, 1) ... St(Ms,Ms)

Table 5.2 Format of Transistion matrix

Here sx, sy denote the score of the teams before the raid and s′x, s′y denote the score after the raid.

Pa(s, s′) denotes the probability that an action awill lead to state s′ from s. The probability transition

is stored in a file (transition). Each state s(sx, sy) is mapped to a unique index calculated using theformula sx × (Mk + 1) + sy (Mk from described in section 5.2.1). Let Ms denote the maximum stateindex value attained which is equal to Mk × (Mk + 1) +Mk.

Let Si denote the unique index for the state s and St(Si, S′i) containing the number times the state of

the game changed from s to s′ when the corresponding action associated with the state matrix is chosen.For each action there is a corresponding state matrix associated and is stored in format shown in table5.2

The parser also assigns a unique index for each strategy and updates the mapping (strategy and indexmapping) when it encounters a new strategy in the log file. Using this transition file the probabilitymatrix is built by using the following formula

Pa(Si, S′i) =

St(Si, S′i)∑Ms

j=0 St(Si, j)

The parser takes in the logger file along with an old transition file and produces an updated transitionfile. The latest probability file has the actual values that are used by MDP to calculate the policy matrixπ.

The Fig 5.1 shows the transformation of simulator logs to probability transition matrix.

The table 5.3 shows how the state of a game changes when played with a particular strategy. Con-sidering the initial state of the game as s : (x, y) (x,y represent the score of anti-raiders and raidersrespectively) the state of the game changed to s′ : (x + 1, y)(i.e anti-radiers win) or s′ : (x, y + 1) ors′ : (x, y + 2) (i.e raiders win). These results are used as a state transition matrix for the MDP strategy.We notice that the semi-circle-defensive strategy has a greater chance of wining compared to the rest(wins 90% of the games played).

40

Figure 5.1 Transformation of simulator logs to transition matrix

strategy (x+1,y) (x,y+1) (x,y+2)korf-aggressive 70 30 0korf-defensive 10 50 40

straight-line-aggressive 60 40 0straight-line-defensive 70 30 0semi-cirle-aggressive 80 20 0semi-cirle-defensive 90 10 0

circle-aggressive 60 30 10circle-defensive 60 30 10

Table 5.3 Representation of Transition matrix Π built using simulator logs

41

x/y 0 1 2 3 40 45.80 36.49 26.54 14.49 -8.01 59.55 50.51 40.81 29.36 0.02 65.99 57.28 47.8 38.0 8.03 63.07 54.8 45.6 38.0 16.04 32.0 30.0 28.0 26.0 24.0

Table 5.4 V matrix

x/y 0 1 2 3 40 5 5 5 5 01 5 5 5 5 02 5 5 5 5 03 5 5 5 5 04 0 0 0 0 0

Table 5.5 Action matrix Π

5.3 Results of MDP

With the parameters of MDP S,A, P (., .), R(., .) as discussed in the above sections, MDP is executedand at the end of the algorithm, π contains the solution and V contains the discounted sum of therewards to be earned by following that solution from state. The table below shows values contained byV matrix as calculated by MDP with Reward matrix from table 5.1 and probability P (., .) generated bythe simulator.

The table 5.4 shows the V matrix and the table 5.5 shows the policy π. Notice that due to gen-eralization of transition matrix (table 5.2) i.e independent of the state and the reward table 5.1, MDPoutputs the action that has maximum probability of winning (semi-circle defensive here). Below is thepolicy π (table 5.6) which is an output of MDP in a specific case where the raider team only playeddefensive strategy in states (2,0) (2,1) (3,0) (3,1). In this scenario semi-circle aggressive strategy has abetter chance of winning which is reflected in the policy table 5.6 and thus claims that MDP providesthe best strategy to play in every state depending on the transition matrix.

To test the effectiveness of MDP strategy we simulated a game with 2 teams one playing with MDPstrategy (Team A) and the other with a random strategy(Team B). The team size is set to 4, capturenumber to 2. The results of a single game are shown in the table 5.7. The game progresses as seriesof raids with each team taking alternate chances. The game starts with team A playing as anti-raiders(Team B raids). Notice that Team A wins the game finally as there are no sufficient players in team Bto capture the raider. Here Active A and Active B denotes the number of active players in a team after

42

x/y 0 1 2 3 40 5 5 5 5 01 5 5 5 5 02 4 4 5 5 03 4 4 5 5 04 0 0 0 0 0

Table 5.6 Action matrix Π

Raid Team Team A Score Team B Score Active Team A Active Team BB 1 0 4 3A 1 1 3 4B 2 1 4 3A 3 1 4 2B 4 1 4 1

Table 5.7 Game simulation

each raid. We ran 40 such games (MDP vs Random strategy) and the team playing with MDP has won85 percent of the games. This shows that our MDP agent can both cooperate and compete.

As the game evolves, it is never certain that a single policy π will lead anti-raiders to victory. Theset of raiders strategies is huge and is difficult to anticipate in advance. The raider might have subtlechanges in its existing strategy or bring in new strategies into play. In which case, the outcome ofa game when played with a particular strategy(action) changes i.e anti-raider strategies which had agreater probability of wining might not have the same probability now. Some strategies might haveincreased chances of winning and some decreased. It is important to learn about these new or alteredstrategies. The transition matrix therefore needs to undergo changes as new strategies are encounteredwhich may change the optimal anti-raider policy. So it is important to update the transition matrix atregular intervals; which can be after a certain number raids, execute MDP again to get a new policypi and use it for the next set of games. The logs generated by the simulator can be used to update thetransition matrix as discussed in section 5.2.4 and thereby improving upon decision making by a teamin terms of strategy to be deployed depending on the state of a game.

43

Chapter 6

Robo Kabaddi Simulation

6.1 Kabaddi Robots - Level I

To understand the challenges involved on a hardware level we built robots that can play Kabaddi.As a basic model we only focused on building robots that can move as per the instructions given them.These robots do not have the ability to perform high computational tasks, neither localize nor detectother agents on the field. These robots were only used to simulate a game through action commands tounderstand real time challenges, mechanical and technical difficulties involved to play Kabaddi.

6.1.1 Building Robots

The major components of a Kabaddi robot are Xbee Wireless Module [9]; to support wireless com-munication and Atmega16 micro controller [3]; to process action commands. These Xbee modules areconfigured as an end device to receive commands which are processed by the micro controller. Thecontroller also drives the motors accordingly after processing the action command. Figure 6.2 showsthe circuit diagram of the robot. Five such robots (figure6.1) are built to analyze a raid in Kabaddi.

6.1.2 Coordinator

All the robots are assisted by a coordinator which is an Xbee module attached to a computer. In eachstep of a raid, the kabaddi simulator broadcasts appropriate action commands to move the robots similarto the simulated agents.

6.1.3 Movement of Robot

Each robot in every move is assumed to be at the center (denoted by R) of a 3 × 3 grid as shownin table 3.1. The size of each cell is 20cm × 20cm. In each step, a Robot can move into any of the8 adjacent cells. Each robot is supported with the following actions to move in a grid based world ofKabaddi.

44

Figure 6.1 Robot Build to play Kabaddi

45

Figure 6.2 Circuit Design of the robot

• Move Forward

• Move Backward

• Rotate 45 degree Left

• Rotate 90 degree left

• Rotate 45 degree Right

• Rotate 90 degree Right

The actions Move Forward and Move Backward moves the robot by 20cm. With the help of theabove mentioned actions, a robot can move into any adjacent cell. Based on the standard directionalconventions; North, East, West, South (NEWS), every robot is programmed to face only a few directionsbut can still move into all the adjacent cells. North, North-East, North-West, East and West are the onlydirections that a robot can face at any point of time. For example if the robot at the center (R) is facingnorth and has to move into the cell with identifier ‘8’, then it takes the following actions (i) Rotate 45degree Left, (ii) Move Backward. Now the robot faces towards the North-West direction. From here tomove into the cell with identifier ‘6’ (assuming the robot to be at R again), it takes the following actions,(i) Rotate 90 degree Right, (ii) Move Backward. Finally the robot now faces towards the North-Westdirection.

46

6.1.4 Messaging Protocol and Action Commands

Each of the 8 adjacent cells, as shown in table 3.1 is uniquely identified by a numbers ranging from 1-8. To move the robot with an identifier <robot identifier>into an adjacent cell with identifier <adjacentcell identifier>the following command has to be broadcasted from the Coordinator Xbee.

<robot identifier><adjacent cell identifier>

If a robot’s identifier is ‘a’ and has to move into an adjacent cell with identifier ‘5’, then the commandto broadcast is“a5”. The micro controller on each robot is configured to process a command only if the<identifier>in a message matches with the unique identifier of the robot. The Xbee modules receivecommands from the server, which are processed by the micro controller. Each command’s identifieris compared to the identifier of the robot. In case of a match, it further processes the <adjacent cellidentifier>and drives the motors to move into the corresponding cell. If the <identifier>did not matchthen it ignores the current command and waits for the next.

6.1.5 Synchronization

In each cycle of a game, the coordinator captures the changes in the environment and broadcaststhe corresponding action command to the robots. The diameter of a Kabaddi robot is 20cm and thediameter of an agent in the simulator is 1 units. For each step which is 1 unit in the simulator a robotis configured to move 20cm on the ground as discussed in the section 6.1.3. The scaling factor for thesimulator and the real world is 20 (i.e.., 1 unit in the simulator is 20cm in the real world).

6.1.6 Hardware Challenges

Each robot carries a huge battery that weighs around 1Kg and moves very slow. So the time takenper one step is high and requires more power to move. To sync the robots with the simulation we reducedthe cycle time (time taken for one step) in the simulator. The robots take around 2 seconds for each step.

For a game like Kabaddi coordination is vital and for this to be possible all the agents have to moveto their corresponding position accurately in each move. We calibrated by trail and error, the amount ofdelay that ’ has to be given for each move to get to the corresponding position .

With the present set-up all the processing is done on the server PC and only action commands are sentto the robot to move. These robots are neither capable of making decisions nor can act on their own.They are not geared to do any computation and cab only operate remotely which makes the systemcentralized. To address these issues, level II robots are built and will be explained in the followingsection.

47

Figure 6.3 Level II Robot

6.2 Kabaddi Robots - level II

To address a few challenges mentioned in section 6.1.6, enhancements were made to the level I robotswith assistance from ECE students in our lab and level II robots are built (Fig 6.3). To make complexcomputations and decision making plausible, demands the need of a high end processor. So the microcontroller is replaced with Raspberry Pi, a credit card sized computer [7]. With Raspberry pi in place,the involvement of simulator is no longer required and an independent Robot Kabaddi game model isdeveloped. Also to reduce friction, two wheels from the front are replaced with a caster wheel. Finallymotors with encoders are added to the robot which are used to correct error in the motion of robot.

48

Figure 6.4 Robot Kabaddi Simulation Model

6.2.1 Kabaddi Robot Game Model

In this model, the coordinator acts as a game manager as well as a communicator between the robots.All the communication takes place through Xbee wireless modules. To interface Xbee with RaspberryPi an XBee API is written (sec 6.2.2.1). In Robot Kabaddi, it is very essential that every robot hasinformation about the state of the game like score, position of robots in its team as well as the raiderrobot. Since our robots neither a have any localization and obstacle detection mechanism, a concept ofvirtual localization based on belief is introduced (sec 6.2.2).

6.2.1.1 Coordinator

The coordinator in this model can perform the following tasks.

1. Scan and discover robots in a local area network.

2. Communicate with robots

49

3. Configure or initialize robots for play

4. Establishes Virtual localization and updates the state of the game

5. Checks for touch, border cross and capture events

All the Xbee modules attached to the robots are pre configured using XCTU [8] to communicatewith each other in API mode. In an API mode every Xbee can send/receive data to to/from other Xbeemodules in the same network. Unlike AT mode which was used in simulation with level I robots, theXbee modules can now receive data in form of packets, which contains multiple byte data rather than asingle byte.

To start a game, the coordinator first scans for robots in the network. After detecting all the robots,the coordinator allocates Unique Id(UID) for each robot and allows an user to configure the robot’smode and position parameters (x and y coordinates). If Mode value is set to 0(zero) then the robot actsas an anti-raider, else if set to 1 then it acts as a raider. The unique Id, initial position and mode valuesare then transmitted to the corresponding robot using the messaging protocol(sec 6.2.2.2) developed forcommunication between robots. After successfully configuration of robots, the game begins.

At every step of the game the coordinator has positional information which is achieved through theconcept of virtual localization(sec 6.2.2). The coordinator keeps track of robots touched and awardspoints to the corresponding team at the end of a game based on capture or successful raid .

6.2.1.2 Robot

Every robot has a boot code already dumped into them. The users can code their strategy for therobot by editing the strategy file linked to the boot code. The code can be accessed by logging into theRaspberry Pi through LAN or simply by connecting a monitor to the Pi and booting it.

Each robot is made available for play by starting the Raspberry Pi. After start up it waits to getconfigured by the coordinator. At each step the robot is provided information regarding the state of thegame. The strategy deployed in the robot processes this information and makes decision on its nextmove. The strategy code outputs the next position the robot has to move and this output is convertedinto corresponding action command by the motor driver code in the robot.

To interface motor driver chip (L293D) with Raspberry Pi a motor driver code was written. Thismotor driver supports the same action commands as did the level I robots but more directional states(8 now 5 before) are added making it more easier to move from a cell to its adjacent cell. From eachof the 8 directional states the motor driver is optimised to choose the best possible way to move to thespecified cell. The video of robot in motion can be found in the reference section [22].

To achieve accuracy in movement the robots have position correction functionality integrated inthe form of PID [31]. A proportional-integral-derivative controller (PID controller) is a control loopfeedback mechanism (controller) widely used in industrial control systems (Programmable Logic con-trollers, SCADA systems, Remote Terminal Units etc). A PID controller calculates an “error” value as

50

the difference between a measured process variable and a desired set point. The controller attempts tominimize the error in outputs by adjusting the process control inputs.

6.2.2 Virtual Localization through communication

Neither level II nor Level I robots are capable to detecting other robots and localize themselves. Butto test the correctness of a strategy on level II robots we developed a concept of virtual localizationbased on the following beliefs.

1. The Coordinator believes that that positional information transmitted by the Robot is always true.

2. The Robot believes that the state of game information provided by the Coordinator is always true

With the above beliefs every agent localizes itself from the state of game information provided bycoordinator at every step, which is true from the belief that positional information transmitted by theRobot is always true. Now it would suffice if a communication is established between the coordinatorand the robots for the game to progress. To achieve communication XBee API and Messaging protocolare developed.

6.2.2.1 XBee API

Xbee API acts as an interface between Raspberry PI and Xbee wireless module. It provides thefollowing functionalities for Raspberry Pi

1. Opens a COM port connection from Raspberry to transmit and receive bytes from XBee

2. Send and process AT commands for Xbee

3. Builds and transmit data packet in the format required by API mode in Xbee.

4. Checks for successful transmission of data.

5. Checks if a data packet is corrupted and notifies the user

6. Checks for availability of data in the Xbee data buffer

7. Fetches received packets and extracts data from the packet.

The AT command in Xbee is used to discover other nodes in the network. Further functionalitiescan be found in the documentation for Xbee [9]. The AT command “ATND” is sent from Xbee todiscover nodes. The response packet holds information regarding number of nodes discovered, their 64bit address, 16 bit network address which are required to send data to these Xbees in the network.

51

IN ‘$’ <identifier> ‘$’ <mode> ‘$’ <x-coordinate> ‘$’ <y-coordinate>

Table 6.1 Initialization Data message format

PD ‘$’ <identifier> ‘$’ <x-coordinate> ‘$’ <y-coordinate>

Table 6.2 Player Data message format

6.2.2.2 Messaging Protocol

To establish communication between robots, messaging protocol is essential. To set up an envi-ronment for robots to play Kabaddi, to assist virtual localization, data transfer is required. In terms ofprogramming language, all the data that is stored in an object(instance of Class, Data structure, etc.,) hasto serialized while transmitting and de-serialized after receiving into its corresponding Data Structure inorder to process.

Data serialization is a popular concept used from in data transfer for many years. Serialized datarequires a unique delimiter to separate between pieces of information, which in our case is “$”. Toeach serialized packet an unique header is added which will be used to identify the proper de-serializingfunction at the receivers end.

Firstly, to make a handshake with the robot, an Initialization Data packet is sent in format shown intable 6.1. “IN” at the beginning of the packet denotes it is an initialization packet and what follows areunique identifier, mode, x-coordinate and y-coordinate of the robot respectively separated by delimiter.

Secondly, for a player to send information about its position, a Player Data packet is sent in theformat shown in the table 6.2. “PD” at the beginning of the packet denotes it is a player data packet andwhat follows are unique identifier, x-coordinate and y-coordinate of the robot separated by delimiter.

Finally, for the Coordinator to send the state of the game information to all robots, a Game Datapacket is sent in the format shown in table 6.2.2.2. “GD” at the beginning of the packet denotes it is aGame Data packet and what follows are number of robot’s data that is being sent and series of robot datawhich includes unique identifier, mode, is robot active, x-coordinate, y-coordinate and is robot touched,separated by delimiter. All this information is required for localization as well as to make decisions onits next move by the robot.

6.2.2.3 Game play

Before the game begins each robot is placed at position the user wants it to be in and later configuresthe corresponding robot with mode and position parameters through coordinator. So we can claim thatthe initial position of the robots are true. After this set up is made the raid begins.

52

GD ‘$’<Number of robots> ‘$’<id> ‘$’ <mode> ‘$’ <active> ‘$’ <x> ‘$’ <y> ‘$’ <touch><id> ‘$’ <mode> ‘$’ <active> ‘$’ <x> ‘$’ <y> ‘$’ <touch>

.

.

.<id> ‘$’ <mode> ‘$’ <active> ‘$’ <x> ‘$’ <y> ‘$’ <touch>

Table 6.3 Game Data message format

To begin with, the coordinator transmits the state of the game which includes UID, mode, and posi-tion of each robot. Each robot then localizes itself and other agents based on UID. It then computes thenext move and transmits the updated position along with its UID. When the coordinator has the updatedpositional information, it processes, checks for a touch, border cross, capture or successful raid events.Now the coordinator again sends the updated state of the game to all the robots and the game continuesuntil there is a winner. The Fig 6.5 summarizes the process of virtual localization and a video of gameplay in action in real time can be found in the reference section [19]. Figure 6.6 shows how the programinside Raspberry Pi executes to scan, configure and establish communication between agents for gameplay.

6.2.3 Hardware Challenges with Level II Robots

Even though level II robots can perform complex computations, they still lack the ability to localizethemselves and move accurately. The following are the factors that account for such issues.

1. Raspberry Pi has no real time clock which is need to compare the output provided by the encodersattached to the motors. For now an estimated value was used to correct the motion of the robotand is not guaranteed to be reliable.

2. We used SoftwarePWM feature of pi to control the power of the motors for this aspect and tunedthe control mechanism accordingly. But since, there was an inherent problem of alignment in themechanical structure of the chassis itself, it by default moved to the diagonal direction despite giv-ing same output count of both motor-encoders. That is, even if PID worked correct and matchedthe encoder counts, it couldn’t correct the direction-deviation of the motion of the robot.

3. The battery is very heavy causing a lot of friction for the robot to move and also with the deploy-ment of Raspberry Pi, it started to consume more power and thereby drained out very fast andwas only able to make around 30 moves before it slows down. As the power decreases the PWMvalues used to power the motor are no longer sufficient to make it move to a specified locationand has to be tuned for the new power value that the motor receives.

53

Figure 6.5 Game Play and Virtual Localization

54

Figure 6.6 Game play on Raspberry Pi

4. Other issues include manufacturing defects in design of shaft demanding for a different set ofPWM values for every robot to make them move or rotate to a specified value.

6.3 Robot simulations

Due to hardware challenges mentioned in the section 6.2.3 a full simulation of raid was not possiblewith level II robots and will require more changes in mechanical, battery and chips used to build therobot.

To reassure that our strategies work, which already proved to work in Kabaddi simulator package(Sec3.2), we also simulated a few scenarios on Simbad Simulator [11] to demonstrate how the robots reactwhen deployed with our game strategies. Simbad is a Java 3d robot simulator for scientific and ed-ucationnal purposes. It is mainly dedicated to researchers/programmers who want a simple basis forstudying Situated Artificial Intelligence, Machine Learning, and more generally AI algorithms, in thecontext of Autonomous Robotics and Autonomous Agents.

The motor driver written inside Raspberry Pi is embedded into the simbad simulator to demonstrateexactly how the robots move when the strategy is deployed into them. The following are snap shots fromsimbad simulator captured to explain crucial scenarios in a raid. Links to full on raid simulations videosare provided in the reference section [23]. Note that the red triangle shows the direction in which the

55

robots are pointing to, ‘R’ denotes raider robot and the dark blue lines denote the borders. The physicsengine of the simulator is altered to match the gird model Kabaddi. The following sections have screenshots of robot simulations at specific scenarios when played with different strategies. These simulationsand snap shots show how a real time robot behaves at different environmental set ups. We see that theymove similar to the simulated agents in Kabaddi simulator package in terms of spread and formation.

6.3.1 Korf Strategy

The Fig 6.7 are screen shots of robot simulations when deployed with Korf aggressive strategy.

StartThe Fig 6.7 (a) shows how anti-raider robots get closer to raider robot in every step.

Middle line crossThe Fig 6.7 (b) shows how anti-raider robots fail to capture the robot and cross the mid line inpursuit the raider. This behaviour claims that any pursuit domain strategy cannot be deployed forKabaddi and will require more attributes to consider while developing strategies.

Border crossThe Fig 6.7 (c) shows how the anti-raider robots are pushed against the border line as this naivestrategy does not take game rules into consideration.

Raid simulationThe Fig 6.7 (d) are screen shots from a full raid simulation of Korf Aggressive Strategy on Simbadsimulator. The video of this simulation can be found in the reference section [23]. The simulationresults with the raider robot making a successful raid due weak court position attained by anti-raider robots in pursuit of raider robot. The Fig 6.8 shows the simulation screen shots along withthe path taken by robots.

6.3.2 Basic Robot behaviour

The Fig 6.9 are screen shots of robot simulations when deployed with any of the strategy developedusing risk function.

Border defenseThe Fig 6.9 (a) shows how the anti-raider robots defend themselves from crossing the borderunlike Korf Strategy. Notice that the robots move to the left or right as they approach the borderwhile maintaining distance from the raider robot. Again this behaviour of robots is due to the riskfrom borders embedded into the risk function.

56

Figure 6.7 Korf Aggressive strategy simulations in simbad

57

Figure 6.8 Korf Aggressive strategy raid simulation in simbad

Mid line fall backThe Fig 6.9 (b) shows that even though the anti-raider robots could not capture the raider robotdue to weak court positioning, they make an effort to catch the raider before it crosses the middleline with out crossing it. This behaviour is because of the risk from to borders introduced in therisk function of this strategy.

6.3.3 Straight Line Strategy

The Fig 6.10 are screen shots of robot simulations when deployed with Straight line formation strat-egy.

Defensive Strategy - startThe Fig 6.10 (a) shows how the anti-raider robots maintain the vertical distance from the raiderrobot as they move from it (playing defensive). They attack only after the raider robot touchesany of the anti-raider robot.

Aggressive Strategy (Notstrict) - AttackThe Fig 6.10 (b) shows how the anti-robots maintain the straight line strategy as the game pro-ceeds. Non strict means that the formation distance from the raider can have an error offset (+1or -1). This offset is introduced to force the anti-raider robot to start attacking when they almostget into the specified formation as some times the robots take more time to get into a perfectformation.

58

Figure 6.9 Simulation of Risk due to border and Middle line

Aggressive Strategy - Attack (Weak positioning)The Fig 6.10 (c) shows how the anti-raider robots end up on a weak court position at the instant oftouch. After a touch, the other anti-raider robots(not touched) are far from the raider thus givingmore room for raider to retreat back to its arena.

Defensive Strategy - Strict FormationThe Fig 6.10 (d) shows how the anti-raider robots get into a proper straight line formation as theytry to capture the raider robot. Notice how the anti-raider robot on top right adjusts its position toalign with other anti-raider robots.

Defensive Strategy Raid SimulationThe Fig 6.11 are screen shots from a full raid simulation of Straight line defensive strategy. Theraids ends with the raider robot making a successful raid as after the touch notice that other anti-raider robots are very far from the raider robot and fail to capture before it crosses the middle line.The video of this simulation can be found in the reference section [23].

Aggressive Strategy Raid SimulationThe Fig 6.12 are screen shots from a full raid simulation of Straight Line aggressive strategy. Theraider robots successfully retreat back its arena and scores a point. Notice that in an aggressivestrategy the anti-raider robots start attacking as soon as they get into the formation (not stricthere). The video of this simulation can be found in the reference section [23].

59

Figure 6.10 Straight line formation strategy simulation in simbad

60

Figure 6.11 Straight line Defensive strategy simulation in simbad

Figure 6.12 Straight line Aggressive strategy simulation in simbad

61

6.3.4 Semi Circular Strategy

Aggressive Strategy - AttackThe Fig 6.13 (a) shows how the anti-raider robots get into a semi circular formation before theyattack the raider robot. The spread value here is set very high and so the distance between theanti-raiders in the formation is very high. Notice how the anti-raider robot on the bottom rightpositions its self far from other anti-raiders but behind the raider as it start to attack after a touch.

Aggressive Strategy - Advantage positioningThe Fig 6.13 (b) shows how the anti-raiders position themselves in a semi circular aggressivestrategy. Notice the semi circular formation the anti-raider robots form around the raider robotbefore attacking. This formation is more effective with more anti-raider robots because more therobots more the raider gets surrounded while it tires to make a touch.

Defensive Strategy - Advantage positioningThe Fig 6.13 (c) shows how the anti-raiders get into a semi circular formation eventually. Thisstrategy claims to be more effective compared to other strategies. In a defensive strategy the anti-raider robots wait for a touch from the raider and then attack. As they defend themselves fromtouch, they make a strong court position allowing the raider to get inside the semi circle while oneor more anti-raider robots move behind the raider robot to capture it.

Defensive Strategy - Border defenseThe Fig 6.13 (d) shows how the anti-raiders defend themselves from being pushed towards theborder by the raider robot . They move towards the left or right, close to the border line with outcrossing it.

Defensive Strategy - SimulationThe Fig 6.14 shows the screen shots from a full raid simulation of Semi circular defensive strategy.The game ends with anti-raider robots capturing the raider robot. The video of this simulation canbe found in the reference section [23].

Aggressive Strategy - SimulationThe Fig 6.15 shows the screen shots from a full raid simulation of Semi circular aggressive strat-egy. The game ends with anti-raider robots capturing the raider robot. The video of this simulationcan be found in the reference section [23].

6.3.5 Circular Strategy

Defensive Strategy - SpreadThe Fig 6.16 (a) shows how the anti-raider robots spread among themselves with a higher valueof spread. Notice that the distance between the anti-raider in left and center keeps increasingeventually.

62

Figure 6.13 Semi circular formation strategy simulation in simbad

63

Figure 6.14 Semi Circular Defensive strategy simulation in simbad

Figure 6.15 Semi Circular Aggressive strategy simulation in simbad

64

Aggressive Strategy Advantage - situationThe Fig 6.16 (b) shows an advantageous position that anti-raiders have while the form a circularformation. Notice how both the anti-raiders to the right position themselves behind the raider as ittries to touch the anti-raider robot on the top left. There are other scenarios where this formationmight land the anti-raider robots in a disadvantageous position which is also discussed in thissection.

Defensive Strategy - Border defenseThe Fig 6.16 (c) shows how the anti-raiders defend themselves from being pushed towards theborder by the raider robot by moving towards the left or right close the border line. As in a circularformation strategy the value of spread is set substantially high so there will be many cases wherethe anti-raiders are pushed towards the boundaries.

Defensive Strategy - Disadvantage situationThe Fig 6.16 (d) shows a situation where the anti-raiders are in a disadvantage. Due to high spreadvalue, even though one of the anti-raider robot positions itself behind the raider, if the raider robotstarts to attack the raider behind it while blocking it from moving up it then ends being touchedand the other anti-raiders being far from the raider at this instant, allow the raider robot to make asuccessful raid.

6.4 Summary

Kabaddi in real time is a fast paced game. Players make quick moves and continuously coordinateamong themselves implicitly to reach their goals. Apart from robots built for pursuit or predator do-main, a typical Kabaddi playing robot should have capabilities to move quickly accelerate/deceleratewhile maintaining positional accuracy, to sense borders and differentiate between team and raider robot.Positional accuracy is important while the robots accelerate and decelerate in every move because it willbe too late to adjust its position in the next move if it lands in a position close to the raider, in whichcase costing the whole team to lose a point.

Robots were built from scratch with the idea to incorporate the above capabilities and have controlover the circuit design, embedding only necessary elements on the board. To start with, Level I robotswere built which can be remotely controlled to simulate play and observe technical and mechanicalproblems which should be addressed to build an autonomous robot. Weight of the robot due to battery,lack of positional accuracy due to friction and manufacturing defect of shaft are apparently noted whilesimulating play. As manufacturing defects are inevitable, positional accuracy must be achieved usingmotors with encoders and process feed back using micro processors. To make complex computationsplausible for communication, decision making with strategy and position control, we used Raspberrypi as a computing device on Level II robots. Instead of four wheels, the two front wheels are replaced

65

Figure 6.16 Circular formation strategy simulation in simbad

66

with caster wheel to reduce friction. With the advent of Raspberry Pi we noticed that due to contin-uous computations being executed as part of strategy and communication, power consumption by therobot has been greatly increased leading to problems in powering the motors and thereby effecting theperformance of the robot. With these problems we found it difficult to control the speed (accelera-tion/deceleration) and thereby position of the robot even with high computing abilities.

We learn that effective mechanical design which includes placement of wheels, battery and othercomponents contributing to movement of robot and circuit design to balance power and voltage controlwith capacitors and resistors to account for voltage hikes and thereby ensuring safety of electronicscomponents is essential to build robots to play Kabaddi in real time.

67

Chapter 7

Conclusions

Kabaddi is a contact sport that originated in the Indian subcontinent and played in a few countriesin Asia. The game of Kabaddi can be considered as a special case of Pursuit domain. But the rulesand restrictions of the game makes it more intricate to solve. As the game involves multiple agents,it can be classified as a Multi Agent System. Since Kabaddi is not a popular sport around the worldthere is not much of literature unlike popular games lke Soccer which has enormous amount of studyon game strategies and robot soccer tournaments. In this thesis we presented Kabaddi as a Multi agentenvironment where agents have to cooperate and compete to win a game. Rather than focusing on asingle area of research in the game Kabaddi, this thesis lays the foundation and opens scope for manyareas of research in terms of game simulation, strategy and robots in Kabaddi.

To develop strategies for the game without worrying about rules of the game, a Kabaddi game sim-ulator is built as a test bed for strategies. Any user can import Kabaddi simulator package library anddelve directly into building strategies. Considering the scope of strategies that can be built with no-tion of speed of agents, analyze game results and game paths with the help of inbuilt logging systemand integration of robots simulation into the simulator, we had to write our own custom simulator forKabaddi. Using this simulator we developed strategies involving implicit cooperation. The concept orrisk was used to build aggressive and defensive strategies involving formations. The results show thatsemi circular formation strategy have a greater probability of wining a game.

Considering that there are numerous strategies that can be developed for Kabaddi, we proposed aMarkov Decision Model for Kabaddi to incorporate multiple strategies into a single complex strategywhich decides upon which strategy to play depending on the state of the game. Results show that MDPis able to provide the optimal strategy which has high probability of winning a game depending on thestate. MDP bases its decisions based on the previous game results, but, to account for a possibilitythat the opponent team can dynamically change their team strategy, MDP has to be executed in regularintervals to make sure that it improves upon its decision eventually as the game proceeds. Thus MDPcan be used as a strategy that not only is cooperative but also for a competitive environment.

To understand the challenges involving on hardware level, we also built robots that can simulate play.Level I robots are built and integrated into the simulator to simultaneously simulate play on robots as

68

the game proceeds in the simulator. Later, theses robots were upgraded as independent robotic agentsthat can perform complex computations and have a potential to play Kabaddi without the need of asimulator. With these Level II robots we were able to establish communication between agents throughconcept of virtual localization and make it more compatible for real time Kabaddi game play. Thoughthis robot has potential to play Kabaddi on software level, there are more mechanical problems that areto be addressed to conduct tournaments. To prove that strategies developed using the simulator alsowork for robots we used a 3D robot simulator to simulate how a robot reacts to particular scenarios inreal time game of Kabaddi. The result show that they were able to defend borders, get into formationsand attack the raider if they have good accuracy in their ability to move to a specified location.

This thesis provides the scope to work on various areas of Multi agent and robotics. Kabaddi is agame involving multi agent aspects like cooperation, competition, decision making and Multi Roboticaspects like AI, position and speed control, blocking and path planning. Developing Strategies forKabaddi would enhance research in these wide set of research areas. Work on Kabaddi can be usedto solve many real time challenges like Strategies for Defense systems against armed forces, mappingarmed forces as a raider which can attack or defend. It is apparent that aggressive strategies are notoptimal in such scenarios. The concept of risk can be extended to form many other formations whichare important in warfare or other games like football, volley ball etc., Formations can also be used onflying drones or robots to accomplish tasks like escorting humans by forming paths through formationsand avoiding obstacles.

As a part of future work, The simulator can be converted into a 3D Multi robot simulator with abilityto add sensors to robots and create more realistic environment for Kabaddi. For robots, more sensors canbe added to be able to detect other agents, localize themselves on the field and ability to change theirspeed and thereby developing strategies considering data provided by these sensors. From the Multiagent perspective of the game, more complex strategies can be developed including speed and a mixof aggressive , defensive or other tactical approach to capture the raider. Also we can have a real timeRobo Kabaddi tournaments to further make progress in the field of Multi agent and Robotics.

69

Bibliography

[1] Kabaddi game rules. http://kabaddiworld.blogspot.in/p/rules-of-kabaddi.html.

[2] M. Aigner and M. Fromme. A game of cops and robbers, 1984.

[3] Atmel. Atmega 16 micro processor. http://www.atmel.com/devices/atmega16.aspx.

[4] M. Bergen, J. Denzinger, and J. Kidney. Teaching multi-agent systems with the help of ares: Motivation

and manual, 2002.

[5] S. D. Bopardikar, F. Bullo, and J. P. Hespanha. A pursuit game with range-only measurements, 2008.

[6] R. J. Brachman and TomDietterich. Intelligent autonomous robotics, a robot soccer case study, 2007.

[7] R. P. community. Raspberry pi. http://www.raspberrypi.org.

[8] Digi. X-ctu. http://www.digi.com/products/wireless-wired-embedded-solutions/zigbee-rf-modules/xctu.

[9] Digi. Xbee wireless module. http://www.digi.com/products/wireless-wired-embedded-solutions/zigbee-rf-

modules/zigbee-mesh-module/xbee-zb-module.

[10] F. V. Fomin and et al. Cops and robber game without recharging.

[11] L. Hugues and N. Bredeche. Simbad simulator. http://simbad.sourceforge.net/.

[12] IIT-Gandhinagar. Robo kabaddi tournament. http://dare2compete.com/competitions/13042/Robo-

Kabaddi/IIT-Gandhinagar.

[13] K. Klein and S. Suri. Chapter 1 multiagent pursuit evasion, or playing kabaddi.

[14] J. R. Kok and N. Vlassis. The pursuit domain package. Technical report, 2003.

[15] R. E. Korf. A simple solution to pursuit games. In Working Papers of the 11th International Workshop on

Distributed Artificial Intelligence, pages 183–194, Feb. 1992.

[16] E. C. Laboratory. Mason simulator. http://cs.gmu.edu/ eclab/projects/mason/.

[17] R. Levy and J. S. Rosenschein. A game theoretic approach to the pursuit problem. In Working Papers of the

11th International Workshop on Distributed Artificial Intelligence, pages 195–213, Feb. 1992.

[18] B. J. M. Brenda and R. Dodhiawala. On optimal cooperation of knowledge sources. Technical report,

Technical Report BCS-G2010, Boeing AI Center, Boeing Computer Services, Seattle, Wa, July 1986.

[19] S. Nagarjuna. Kabaddi on rasperry pi. https://www.youtube.com/watch?v=bFFOvZNUG7c.

[20] S. Nagarjuna. Kabaddi simulator package. https://www.dropbox.com/sh/u9fr23yfe2gg7rh/LxBDmfwV9h.

[21] S. Nagarjuna. Ksp simulations. https://www.youtube.com/playlist?list=PLxQq JtJBMpr01Fi0s13sPXLNdhd47Xv2.

[22] S. Nagarjuna. Robot motion commands. http://www.raspberrypi.org.

70

[23] S. Nagarjuna. Simbad. https://www.youtube.com/playlist?list=PLxQq JtJBMpoXTblXU7COM HbcdC72zKP.

[24] N. Noori and V. Isler. The lion and man game on convex terrains.

[25] E. Raboin, U. Kuter, and D. S. Nau. Generating strategies for multi-agent pursuit-evasion games in partially

observable euclidean space.

[26] M. Smith. Pickpocket: A computer billiards shark. Artificial Intelligence, 171(1617):1069 – 1091, 2007.

[27] P. Stone and M. Veloso. Multiagent systems: A survey from a machine learning perspective. AU-

TONOMOUS ROBOTS, 8:345–383, 1997.

[28] J. M. Vidal and E. H. Durfee. Recursive agent modeling using limited rationality, 1995.

[29] Wikipedia. Kabaddi. http://en.wikipedia.org/wiki/Kabaddi.

[30] Wikipedia. Markov decision process. http://en.wikipedia.org/wiki/Markov decision process.

[31] Wikipedia. Pid control. http://en.wikipedia.org/wiki/PID controller.

[32] F. L. Wrtz. This thesis concerns arti cial intelligence in the predator/prey domain. the predator/prey, 2008.

71

multi agent game of kabaddi: strategies and simulation

Documents