new ties wp2 agent and learning mechanisms

18
NEW TIES WP2 Agent and learning mechanisms

Upload: wynne-stanton

Post on 31-Dec-2015

10 views

Category:

Documents


0 download

DESCRIPTION

NEW TIES WP2 Agent and learning mechanisms. Decision making and learning. Agents have a controller (decision tree, DQT) Input: situation (as perceived = seen/heard/interpr’d Output: action Decision making = using DQT Learning = modifying DQT - PowerPoint PPT Presentation

TRANSCRIPT

NEW TIES WP2 Agent and learning mechanisms

Decision making and learning

Agents have a controller (decision tree, DQT) Input: situation (as perceived = seen/heard/interpr’d Output: action

Decision making = using DQT Learning = modifying DQT Decisions also depend on inheritable “attitude

genes” (learned through evolution)

Example of a DQT

0.5

B

BT

ABias Test Action Decision 0.2 Genetic bias YES Boolean choice

Legend

VISUAL:FRONTFOODREACHABLE

T

NO YES

TURNLEFT

MOVE TURNRIGHT

A

0.6 0.2 0.2

PICKUP

1.0

A

BAG:FOODT

YES NO

TURNLEFT

MOVE TURNRIGHT

A

0.6 0.2 0.2

EAT

1.0

A

0.5

Interaction evolution & individual learning

Bias node with n children each with bias bi

Bias ≠ probability Bias bi is learned, changing (name: learned

bias) Genetic bias gi is inherited, part of genome,

constant Actual probability of choosing child x:

p(b,g) = b + (1 - b) ∙ g Learned and inherited behaviour are linked

through formula

DQT nodes & parameters cont’d

Test node language: native concepts + emerging concepts

Native: see_agent, see_mother, see_food, have_food, see_mate, …

New concepts can emerge by categorisation (discrimination game)

Learning: the heart of the emergence engine

Evolutionary learning: not within an agent (not during lifetime), over

generations by variation + selection

Individual learning: within one agent, during lifetime by reinforcement learning

Social learning: during lifetime, in interacting agents by sending/receiving + adopting knowledge pieces

Types of learning: properties Evolutionary learning:

Agent does not create new knowledge during lifetime Basic DQTree + genetic biases are inheritable “knowledge creator” = crossover and mutation

Individual learning: Agent does create new knowledge during lifetime DQTree + learned biases are modified “knowledge creator” = reinforcement learning (driven by

rewards) Individually learnt knowledge dies with its host agent

Social learning: Agent imports knowledge already created elsewhere (new? not

new?) Adoption of imported knowledge ≈ crossover Importing knowledge pieces

can save effort for recipient can create novel combinations

Exporting knowledge helps its preservation after death of host

Present status of types of learning

Evolutionary learning: Demonstrated in 2 NT scenarios Autonomous selection/reproduction causes problems with

population stability (im/explosion) Individual learning:

code, but never demonstrated in NT scenarios Social learning:

Under construction/design based on the “telepathy” approach

Communication protocols + adoption mechanisms needed

Evolution: variation operators

Operators for DQT: Crossover = subtree swap Mutation =

Substitute subtree with random sub-tree Change concepts in test nodes Change bias on an edge

Operators for attitude genes: Crossover = full arithmetic xover Mutation =

Add Gaussian noise Replace with random value

Evolution: selection operators

Mate selection: Mate action chosen by DQT Propose – accept proposal Adulthood OK

Survivor selection: Dead if too old ( ≥ 80 years) Dead if zero energy

Experiment: Simple world

Setup: Environment

World size: 200 x 200 grid cells Agents and food (no tokens, roads, etc).

Both are variable in number. Initial distribution of agents (500): in

upper left corner Initial distribution of food (10000): 5000

in upper left and lower right corner.

Experiment: Simple world

Setup: Agents

Native knowledge (concepts and DQT sub trees)

Navigating (random walk) Eating (identify, pickup and eat plants) Mating (identify mates, propose/agree)

Random DQT-tree branches Differs per agent Based on the “pool” of native concepts

Experiment: Simple world

Simulation continued for 3 months real time to test stability

Experiment: Poisonous Food

Setup: Environment

Two types of food: poisonous (decreases energy) and edible (increases energy)

World size: 200 x 200 grid cells Agents and food (no tokens, roads, etc). Both

are variable in number. Initial distribution of agents (500): uniform

random over the grid space. Initial distribution of food (10000): 5000 of

each type of food uniform random over the same grid space as the agents.

Experiment: Poisonous Food

Setup: Agent

Native knowledge Identical to simple world experiment

Additional native knowledge Can distinguish poisonous from edible plants Relation with eating/picking up is not present

No random DQT-tree branches

Experiment: Poisonous Food

Measures

Population size Welfare (energy) Number of poisonous and edible plants Complexity of controller (nr. of nodes) Age

Experiment: Poisonous Food

Demo

Experiment: Poisonous Food Results

0

500

1000

1500

2000

2500

timestep 1250 2500 3750 5000 6250 7500 8750 10000 11250 12500 13750 15000

population size

healthy plants (x10)

poisonous plants (x10)

average agent energy (x100)