hybrid architecture for cognitive agents oscar romero lópez

Download Hybrid Architecture for Cognitive Agents Oscar Romero López

If you can't read please download the document

Upload: eunice-walton

Post on 18-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

Perception & WM Sensors Raw Data: , , , …. Interpreter Dim 1 Dim 2 Dim 3 Val 1 Val 2 Val 1 Val 2 Val 1 Val 2 Val 3 Working Memory p1 p2 p3 -Time constraint (7 secs.) - Base Level Activation (Interpreter) - Dimension-Value pairs from LT memory Percepts: dimension 2: observed-object-position values: right, left, in-front OM

TRANSCRIPT

Hybrid Architecture for Cognitive Agents Oscar Romero Lpez Cognitive Architecture Big Picture Perception & WM Sensors Raw Data: , , , . Interpreter Dim 1 Dim 2 Dim 3 Val 1 Val 2 Val 1 Val 2 Val 1 Val 2 Val 3 Working Memory p1 p2 p3 -Time constraint (7 secs.) - Base Level Activation (Interpreter) - Dimension-Value pairs from LT memory Percepts: dimension 2: observed-object-position values: right, left, in-front OM Simple Decision Making Cycle Perception Module Perception Module Motor Module Motor Module Behavioural Module (Basal Ganglia) Avoid-Obstacles Look-For-Box Pick-Up-Box Store-Box Charge-Battery perp1, perp3, perp11,.. perp2, perp9, perp10.. perp3, perp4, perp8.. How can the robot decide which behaviour must be activated? Behaviour Modulation Bio-inspired Bottom-Up Approach Implemented Hybrid Behaviours Context-Dependent Behaviours Task Modules driven by Behaviour Networks and evolved by GEP High-order cognitive skills: Plan extraction, problem solving Task refinement through both GEP and a co-evolutionary mechanism Epigenesis Ontogenesis Phylogenesis Behaviors Avoid-Obstacles Look-For-A-Storage Look-For-B-Storage Look-For-Battery-Charger Look-For-A-Box Look-For-B-Box Pick-Up-A-Box Pick-Up-B-Box Store-A-Box Store-B-Box Charge-Battery Avoid-Obstacles Look-For-Object Pick-Up-Object Store-Object Charge-Battery Implemented BehaviorsContext-Dependent Behaviours - Same sensory input - Same action output - Same feedback signal pattern - Different internal/ external state The Epigenetic Approach: (Learning and Adaptation) Backpropagation Neural Networks Artificial Immune Systems Production Rules: - Expert Rules (ER) - Sub-symbolic extraction rules Cell differentiation (Biol.). EA defines the mechanisms which allow an individual (agent) to modify some aspects of his internal/external structure as a result of the interaction with the surrounding environment. Behaviour Specialization. Connectionist Module Backpropagation Neural Networks -Straight BP: Supervised learning algorithm (off-line) - Reinforcement BP: Q-learning for reinforcement learning (on-line) Connectionist Module ( Charge-Battery Behavior ) Artificial Immune System Sensory Input (percepts) - Antigen Sensory Input (percepts) - Antigen Actuators Antibodies Repertoire Antibodies Repertoire Credit distribution system Genetic Algorithm Meta-dynamics Matching Antigen- Antibody Artificial Immune System (B-C Behavoiur) 01# conditionaction Battery charged Charger is far Charger Posi: left Carrying box? Speed: Go forward Turn Rate: Turn-right Gripper St: close 01# Iniitial Antobody Evolved Antobody Meta-dynamics + Genetic Algorithm Gripper openTurn-left Sub-symbolic Extraction Rules (SER) NN SER AIS (dim1: val2) (dim3: val4) extractgeneralizespecialize Rule n: (dim1: val1, val2) (dim3: val3, val4, val6) Rule n1: (dim1: val2) (dim3: val4) Rule n2: (dim1: val1) (dim3: val4) Rule n3: (dim1: val1) (dim3: val3) Rule n4: (dim1: val2) (dim3: val6) if IG(C, all) > threshold1 Rule n: (dim1: val1, val2) (dim3: val3) Rule n1: (dim1: val1) (dim3: val3) Rule n2: (dim1: val2) (dim3: val3) if IG(C, all) < threshold2 Integrating the action recommendations Behavior n ER SER AIS NN w ER w SER w AIS w NN Action recommendations Integrated action p: probability x: curent state i,j: matching rules (j is a range) : temperature U: Boltzmann distribution Act1011 BLA 1101 0100 1110 Ontogenetic Approach: ( Development ) This approach permits the development of a given functionality from the information stored in the agents genome. TASK Focus Manager Precondition List: - Precond 1 - Precond 2 - Precond 3 - Add List: - state 1 - state 2 - Del List: - state 1 - state 2 - Goal ActivationLink InhibitionLink BehaviorNetwork Ontogenetic Approach: ( Development ) Task 1: Box-CollectingTask 2: Box-Piling-Up C C A A B B C C B B A A 1 0.45 0.25 0.5 0.3 0.8 0.6 0.45 0.8 Modify Parameters - Create new Behaviours - Modify existing Links C C B B A A C C B B A A Ontogenetic Approach using GEP B1B1 B3B3 B5B5 B6B6 B2B2 B6B6 B2B2 B1B1 B3B3 B1B1 B5B5 B6B6 BN 1BN 2BN n... ADF: Gene G 1 ADF: Gene G 2 ADF: Gene G n Homeotic Gene HG 1 G5G5 Hierarchical Task Network Chromosome 1 Goal 1 Params: a1,b1,.. Goal 2 Params: a2,b2,.. Goal 3 Params: a3,b3,.. enables facilitates G1G1 limits G3G3 G2G2 G3G3 G4G4 ADF: G k ADF: G k+1 ADF: G kn HG k Chromosome n Ontogenetic Approach using GEP Plan n Task 1 Task 2 HTN Goal. 1 Beh. 3 Beh. 5 Goal 2 Beh. 6 Beh. 1 Beh.2 Beh.3 AND params addlist Beh.1 Beh.4 addlist AND Task 1enables Task 1Task 3 Ontogenetic Approach - Fitness Purpose: combining multiple fitness functions in some way so as to produce an aggregate scalar fitness function. R: selection range is used as a limit for selection to operate, (100) P( ij ): the value predicted by the individual program i for fitness case j (out of n fitness cases) - neg. feedback: number of no well-formed structures - neg. feedback: num of contradictory links (addList vs deleteList) - pos. feedback: inverse num of activation cycles to activate the next behavior at the current task - pos. feedback: inverse num of steps for a BN to achieve a goal (1 / steps) T j : the target value for fitness case j (precision of 0.01) Ontogenetic Approach - Flow Generate pseudo-random population of chromosomes (plan) For each chromosome i at iteration j Calculate the fitness function Validate if the chromosomes fits the current and past goals. Integrate the behavior activation (Borda vouting method) Apply local genetic operators: selection and replication, Mutation, Gene Transposition, and Gene Recombination Behavior Co-evolution Agent 1 B1B1 B2B2 B3B3 Agent 2 B1B1 B2B2 B3B3 Agent 3 B1B1 B2B2 B3B3 Behavior Repository B 1 Behavior Repository B 1 Behavior Repository B 2 Behavior Repository B 2 Behavior Repository B 3 Behavior Repository B 3 Memetic Algorithm Phlylogenetic Approach: Co-evolution Q-learning Updating of Q(x, a) : discount factor that favors reinforcement sooner relative to that received later a i :an action that can be performed at step i (with a 0 = a) r i : is the reinforcement received at step i (positive, negative, or zero) e(y):max Q(x, b) x, y:sensory input (internal and external), wm items, current goal Comparisons Straight BP Neural Network SENSORBITSDESC Sensor 01Esta la caja en el rango de observacion detectado por el fiducial? no/si Sensor 11Distancia de aproximacion? no/si (appDistance) Sensor 21Esta la caja a la izq o a la derecha? (signo del angulo beta) Sensor 31Alinearse por la izquierda o por la derecha? (umbAngAlign) Sensor 41Angulo de alineacion con el eje (vert/horiz) menor que umbral? no/si (umbAngAlign) Sensor 51Distancia es menor que la distancia minima de avance hacia adelante? (umbPosPerpndl, umbPosFrente) Sensor 61Robot esta orientado perpendicular/ con la caja? (enfrente) Sensor 7 2Angulo de alineacion menor que umbAngAlineacionFinal (0=menor que umbral, 1=angNegativo, 2=angPositivo, 3=totalmente alineado) Sensor 82Numero de beams (0 =0,... 3=3) Sensor 92Estado del gripper (0=open, 1=closed,2=moving) ACCION Accion 02speed, avoidspeed, aproxspeed Accion 13noturn, turnleft, turnright, alignleft, alignright Accion 21opengripper, closegripper Patterns Sensory InputOutput //Approaching //Aligning with axis (vert/horizon) //Go straightforward //Rotate //Correct position //Grasp object AIS Mathematical Model Strenght: S i (t + 1) = S i (t) B i (t) T i (t) + R i (t). Bid: B i (t) = C apu * S i (t) + (k 1 * BidRadio BRPow ) Noise: AE i = B i + N( apu ) Impuesto1 C impuesto = 1 (1/2) (1 / n) Impuesto2 T i = C impuesto S i Final strenght S(t+1) = S(t) C apu S(t) - C impuesto S(t) + R(t) Imp apu C apu S(t) Parameters Reinforcement Functions Obstacles Avoidance Behaviour: In case of collision: r = -1 Otherwise r = 0.3 (d c - d p ) (dc c - dc p ) Where, d: distance between the object and the robot dc: distance of collision (sensors in the front) between the object and the robot c: current value p: prior value Pick up object Behaviour: In case of achiving the goal: r = 1 Otherwise r = 0.5 1/ (d1 - d2) 1/ (a1 - a2) Where, d: distance between the object and the robot a: angle between the object and the robot AIS architecture BN Mathematical Model Behavior Definition Succesor Link (add list) Predeccesor Link (precondition) Conflicter Link (delete list) BN Mathematical Model Hierarchical Task Networks Chunk Mesa: [patas, 4] [color, marrn] [forma, cuadrada] Chunk Silla: [patas, 4] [color, negro] [material, metal] patas color forma 4 marrn cuadrada mesa patas color material 4 negro metal silla Memoria Asociativa Estado del Mundo Nivel Emergente Nivel Cognitivista Regla Asociativa Tiene-un Procedural Integration Ontogenetic Approach: ( Development ) LookFor Battery Charger NoChargId,NoObst LookFor A-Box StorgId,WithBatt,NoObst,BoxNoId, NoObst CollectA-Boxes Charge Battery ChargerId,NoBattery,NoObst Avoid Obstacles the mean level of activation the threshold for becoming active the amount of energy for propositions the amount of energy for goals the amount of energy for protected goals PickUp A-Box BoxId,WithBatt,StorgId, LookFor A-Storage NoStorgId,NoObst Store A-Box WithBatt,BoxGrasp,StorgId,NoObst