educating attention in cognitive robots - coneural.orgconeural.org/goga/papers/phdplan_2005.pdf ·...

15
Thesis research Plan Educating attention in cognitive robots Ioana Goga Autonomous Systems Lab, ASL3-I2S-STI Swiss Institute of Technology in Lausanne - EPFL CH-1015 Lausanne, Switzerland [email protected] 1. Introduction Early in the history of robotics it was generally believed that pre-programming robots with specific problem-solving capabilities would be sufficient to guarantee intelligent behavior. Research during the last decades has shown that classical AI systems are usually unable to adapt to situations unforeseen by the programmer and most importantly, they exhibit a fundamental grounding problem. That is, the agent’s behavior, as well as its mechanisms and representations, should be intrinsic and meaningful to itself, rather than dependent on an external designer (Searle, 1980; Harnad, 1990). One obvious source of inspiration for building integrated, artificial systems that can scale both the size and complexity of their behavioral repertoire is the process of human cognitive development. Among the attractive features of a developmental approach are its open-endedness, its biological plausibility, and the incrementally increasing behavioral complexity in a principled way (Zlatev and Balkenius, 2001). The rapprochement of developmental and social psychology, robotics, and neuroscience has given birth to several research fields: epigenetic robotics (Zlatev and Balkenius, 2001), cognitive robotics (Weng et al., 2000), social robotics (Billard and Dautenhahn, 2000). The new fields are multidisciplinary in nature, and are intended to provide a unified framework for the development of a range of cognitive capabilities (e.g., vision, audition, language, decision-making) In this thesis, we will follow a developmental approach to study the mechanisms by which human infants acquire goal-directed behavior and language and to investigate how to develop similar competences in robots. The focus will be on investigating how interactive teaching, based on tutoring and demonstration can be implemented in robots. 2. An integrative perspective on cognitive development Development in human infants is supported on one hand by the gradual increase of learning resources and abilities through biological maturation, and on the other hand through the parental scaffolding of the inputs (Bruner, 1977). Scaffolding refers to the caregiver’s structuring of an interaction by building on what she knows a child can already do. Similarly, an integrative approach of cognitive development in robots should consider the effects of at least two processes: a) the evolution of the neural

Upload: dangthu

Post on 30-May-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Educating attention in cognitive robots - coneural.orgconeural.org/goga/papers/PhDPlan_2005.pdf · Educating attention in cognitive robots ... by a cognitive model whose ability to

Thesis research Plan

Educating attention in cognitive robots

Ioana Goga Autonomous Systems Lab, ASL3-I2S-STI

Swiss Institute of Technology in Lausanne - EPFL CH-1015 Lausanne, Switzerland

[email protected]

1. Introduction Early in the history of robotics it was generally believed that pre-programming robots with specific problem-solving capabilities would be sufficient to guarantee intelligent behavior. Research during the last decades has shown that classical AI systems are usually unable to adapt to situations unforeseen by the programmer and most importantly, they exhibit a fundamental grounding problem. That is, the agent's behavior, as well as its mechanisms and representations, should be intrinsic and meaningful to itself, rather than dependent on an external designer (Searle, 1980; Harnad, 1990).

One obvious source of inspiration for building integrated, artificial systems that can scale both the size and complexity of their behavioral repertoire is the process of human cognitive development. Among the attractive features of a developmental approach are its open-endedness, its biological plausibility, and the incrementally increasing behavioral complexity in a principled way (Zlatev and Balkenius, 2001). The rapprochement of developmental and social psychology, robotics, and neuroscience has given birth to several research fields: epigenetic robotics (Zlatev and Balkenius, 2001), cognitive robotics (Weng et al., 2000), social robotics (Billard and Dautenhahn, 2000). The new fields are multidisciplinary in nature, and are intended to provide a unified framework for the development of a range of cognitive capabilities (e.g., vision, audition, language, decision-making)

In this thesis, we will follow a developmental approach to study the mechanisms by which human infants acquire goal-directed behavior and language and to investigate how to develop similar competences in robots. The focus will be on investigating how interactive teaching, based on tutoring and demonstration can be implemented in robots. 2. An integrative perspective on cognitive development Development in human infants is supported on one hand by the gradual increase of learning resources and abilities through biological maturation, and on the other hand through the parental scaffolding of the inputs (Bruner, 1977). Scaffolding refers to the caregiver's structuring of an interaction by building on what she knows a child can already do. Similarly, an integrative approach of cognitive development in robots should consider the effects of at least two processes: a) the evolution of the neural

Page 2: Educating attention in cognitive robots - coneural.orgconeural.org/goga/papers/PhDPlan_2005.pdf · Educating attention in cognitive robots ... by a cognitive model whose ability to

architecture and of the computational mechanisms and b) the structuring of the task by providing contextual support at the learner's level.

The vast majority of works in cognitive robotics consider one or other of these processes in isolation. The recent field of epigenetic robotics is primarily concerned with the investigation of incremental development in embodied systems (Scasellatti, 1998; Metta et al., 1999; Weng et al, 2000; Kozima and Yano, 2001). A common approach is to start with a basic set of behavioral repertoires that gradually develop to accommodate more complex behaviors. Several successes have been achieved in developing joint attention mechanisms (Nagay et al., 2003), imitation skills (Kuniyoshi et al., 2000; Kozima and Yano, 2001), sensorimotor coordination (Scassellatti, 1998; Metta et al., 1999; Wermter and Elshaw, 2003).

The social robotics strand emphasizes the key role of social interaction in building intelligent robots capable to understand and predict human behavior (Billard and Dautenhahn, 2000; Breazeal, 2003; Treister-Goren and Hutchens, 2004). The process of robot tutoring shares many similarities with the parental scaffolding of inputs to human infants (i.e., provide feedback, structure experience, and regulate the complexity of information). There is recent interest in the field, towards collaborative learning based on joint intention and progressive tutoring of the learner (Breazeal et al., 2004).

This thesis proposes an integrative approach of cognitive development in robots, based on the modeling of computational processes on three axes: maturation and learning, scaffolding and re-use (see Figure 1). The re-use hypothesis is inspired by research studies on the computational and neural bases of language development and it states that dynamical neural processes from the sensory-motor areas of the brain provide the computational building blocks for higher level functions, including those involved in cognition and language (Greenfield, 1991; Rizzolatti and Arbib, 1998; Reilly, 2002; Dominey et al., 2002; Pulvermüller, 2002).

The primary objectives of this thesis are methodological and they investigate: • The computational mechanisms that support gradual increase of behavioral

complexity in the domains of competence investigated (i.e., sensorimotor and language). The outcome is represented by a cognitive model whose ability to replicate the staged performances of human infants will be tested through simulations.

• The bootstrapping effects on the development of skilled behavior in one domain of competence induced by the operation of the other domain of competence. First, we will study the effect of providing the learner with a number of sensorimotor and social competences on the speed and efficiency with which the agent acquires a language. Second, we will investigate the benefits on replicating goal-directed behavior from language understanding and production.

In the following section we will discuss in more detail the behaviors and the learning means investigated in the chosen domains of competence.

Page 3: Educating attention in cognitive robots - coneural.orgconeural.org/goga/papers/PhDPlan_2005.pdf · Educating attention in cognitive robots ... by a cognitive model whose ability to

Figure 1. Three-dimensional epigenetic system for the incremental development of cognitive structures. Two domains of competence are investigated: sensorimotor and linguistic. Development along Maturation and Scaffolding axes is within-domain, and along the Re-use axis is inter-domain. On each axis, the system can start from a set of pre-wired abilities and reach different levels of competence. Traditionally in cognitive robotics, the model integrates development along one of the axis. In contrast, cognition and language in humans occur from the interaction of maturation, learning, scaffolding and re-usage processes.

3. Towards integrating learning by demonstration and learning by instruction

Learning by imitation has many of the desirable characteristics of animal-like, online learning, and endows the robot with the capacity to continually learn new behaviors while observing and interacting with other agents (Demiris and Hayes, 2001). Among current successes on imitation learning are systems capable of immediate and deferred imitation of head movements (Demiris and Hayes, 2001) or arm movements (Schaal, 2000; Billard and Mataric, 2001), and systems capable of generalization and reproduction of arbitrary tasks (Calinon and Billard, 2005).

Learning by imitation requires the capacity to recognize goals, understand how individual actions are embedded in a hierarchy of sub-goals, and extract and recompose recursive structures (Byrne and Russon, 1998). Evidence from psychology, neuroscience and ethology (for a review see Byrne and Russon, 1998; Arbib, 2004) suggest that the ability to imitate, together with other social abilities, are fundamental for the development of language and human cognition. This thesis will bring computational evidence to the hypothesis that common mechanisms underlie hierarchical compositionality in imitation, object manipulation and language development.

In order to assist the imitator in recognizing the goal of the demonstration and in structuring the imitation behavior, the teacher can use instruction. In learning by instruction the agent is given information about the environment, domain knowledge, or about how to accomplish a particular task on-line. The learner selects and transforms the knowledge from the input language to an internally-usable representation and integrates it with prior knowledge for effective retrieval and use

Maturation and learning

Scaffolding

Re-use

Pre-wired competences

Language

Sensorimotor Level of performance

Level of performance

Page 4: Educating attention in cognitive robots - coneural.orgconeural.org/goga/papers/PhDPlan_2005.pdf · Educating attention in cognitive robots ... by a cognitive model whose ability to

(Michalsky, 1987). Most of the human tutoring is interactive: the teacher is prepared to instruct the agent when it notices some errors, or when the agent lacks knowledge and requests it. Representative examples of classic AI systems that learn from instruction are Instructo-Soar (Huffman et al., 1993) and Prodigy (Carbonell et al., 2001). In Instructo-Soar the learner follows instructions from the expert as it is performing its task, and then learns from them, thereby avoiding future instruction.

The main drawback of the classical instruction-based system is that they require extensive prior knowledge that is wired in the rules of pre-programmed grammars. The developmental approach taken by us, aims at grounding the agent�s knowledge through its interaction with the environment, thus, avoiding the fundamental grounding problem that desambodied systems have to face. In doing this, we follow a recent trend in language modeling that has obtained important results in grounding the semantics of spatial terms (Regier, 1995; Roy, 2002), verbs (Bailey, 1997; Siskind, 2001), and perceptual features of objects (i.e., size, shape) (Wermter et al, 2003) on non-linguistic structures, such as: the parameters of the motor system (Bailey, 1995; Wermter et al., 2003), the visual features of the scene (Roy, 2002), the visual primitives which encode notions of support, contact and attachment (Siskind, 1995). Grounded artificial systems are capable to describe a visual scene (DESCRIBER, Roy, 2002), to understand and use grammatical constructions (Dominey, 2006), to develop a meaningful lexicon and to communicate with other agents (Steels, 1996), or to simulate the linguistic behavior of a 15 months old infant (Ai, Treister-Goren and Hutchens, 2004).

A disadvantage of previous models of language and cognitive development in robotics is represented by the artificial data used for training and testing. One strand of research concentrates on grounding lexical items, such as spatial terms (Regier, 1995), names of body parts (Wermter et al., 2003) or actions (Bayley, 1997) on non-linguistic knowledge. This approach over-simplifies the task of understanding and generating language. The other strand strives to illustrate that artificial systems are capable of understanding and reproducing complex sentences such as �The moon gave the block to the cylinder� (Dominey, 2006) or �the green rectangle to the right of the big pink rectangle� (Roy, 2002). These systems are usually based on statistical processing means, and lack the ability to generalize to new domains of knowledge and to increment the complexity of their behavior in a principled way.

In contrast to previous modeling works, our approach to language grounding starts from a collection of real data transcripts, which characterizes the sensorimotor and linguistic patterns of interaction between human caregivers and infants. By doing this, we consider the role that imitation, joint attention, and parental scaffolding play in the maturation of linguistic and cognitive skills. The role of social interactions has been documented at length in the psycholinguist literature (Tomasello, 1988; Nadel and Butterworth, 1998; Bruner, 1977), but it was overlooked by computational models of early language acquisition.

The novelty of our approach also results from the investigation of the imitation and tutoring mechanisms from an integrative perspective (see Figure 1 and Section 2). Both can be seen as building blocks of a general bootstrapping mechanism, which uses all available means to focus attention, extract a bit of knowledge, and use this knowledge to perform little more analysis on future inputs, and thereby, reduce the uncertainty. The agent starts with a set of pre-programmed behaviors (i.e., gaze following, skin color preference, visuomotor coordination, grasping abilities) and develops in an incremental manner goal-directed behavior, intentionality and

Page 5: Educating attention in cognitive robots - coneural.orgconeural.org/goga/papers/PhDPlan_2005.pdf · Educating attention in cognitive robots ... by a cognitive model whose ability to

language. The more knowledge the agent acquires through verbal and sensorimotor imitation, the more it can understand instruction and focus its attention. The more it understands, the more it can learn and develop better imitation and cognitive strategies. We refer to this bootstrapping process based on demonstration and tutoring as a process of educating attention in cognitive robots. 4. Work conducted during the 1st year

4.1 Design of the cognitive architecture Part of the work carried out during the first year of my PhD concerned the design of a cognitive architecture, which can support learning by imitation and instruction. Goal-directed imitation and language processing have some common computational needs: a) an ability to learn from examples and to represent categorical information in a sub-symbolic manner; b) the operation of a mechanism for grounding internal representations on sensorimotor processes; c) the ability to learn from and to represent time ordered sequences; d) the capacity to process and satisfy multiple constraints in a parallel manner. We also consider the implementation of a computational mechanism that supports multi-modal associations and cortical programs re-use.

Figure 2. Schematic of the basic component of the neural network model

implemented. Each cell assembly receives from the attention module (not shown) a saliency signal. The feed-forward flow stands for the set of feature weights, which learn through a Hebbian-like adaptation process the sub-symbolic representation of the external concepts. The precedence flow indicates the learning of temporal dependencies between co-activated cell assemblies. The relational connections learn invariant relationships between the feature representation of objects. The inter-domain connections are used to apply patterns of activity from a well developed to a less developed domain. The external feature layers can stand for visual object features or for semantic word features.

Page 6: Educating attention in cognitive robots - coneural.orgconeural.org/goga/papers/PhDPlan_2005.pdf · Educating attention in cognitive robots ... by a cognitive model whose ability to

The strength of the architecture proposed by us resides in the computational building block, inspired from research studies in the neuroscience of language (Pulvermüller, 2002), cognitive development (Frezza-Buet and Alexandre, 2002) and information processing theories (Adaptive Resonance Theory, Carpenter and Grosbberg, 1987). The computational primitive, referred to as a cell assembly, is a concept that stands for a neuronal set, that is, a selection of neurons that are strongly connected to each other, act as a functional unit, can be primed and ignite. Each cell assembly can transit an increased number of activation states and its weights are subject of temporal learning as a function of the level of activation. Each cell assembly receives input from external sensorial units and can have an activating effect on other cell assemblies directly connected to it (see Figure2).

Figure 3. Operation of the constraints satisfaction attention module. The

output of the attention module is computed as a result of the satisfaction of bottom up (e.g., color and motion contrasts) and top-down constraints (e.g., gaze and skin color). The focus of attention is deployed to the most salient location in the scene, which is detected using a winner-take-all strategy. Once the most salient location is focused, the system uses a mechanism of inhibition of return to inhibit the attended location and to allow the network to shift to the next most salient object.

A cell assembly can participate in four types of learning (see Figure 2): a)

learn a sub-symbolic feature representation of an external concept or event, b)

Page 7: Educating attention in cognitive robots - coneural.orgconeural.org/goga/papers/PhDPlan_2005.pdf · Educating attention in cognitive robots ... by a cognitive model whose ability to

participate in the formation of sequence detectors; c) store information concerning systematic relations among sensorial or semantic features that constitute specific categories; d) apply a pattern of activation to another domain of activity. In our model, learning of temporal dependencies is supported by the graded activation of the cell assemblies and is facilitated by the layered architecture. For a detailed description of the cell assembly model we refer the reader to Goga and Billard (in press).

An essential component of the cognitive architecture is represented by the attention module, whose function is to direct gaze towards objects of interest in the environment. A two-component framework for attention deployment has been implemented, inspired by recent research in modeling of visual attention (Itti and Koch, 2001). Bottom-up attention is computed in a pre-attentive manner across the entire visual image, based on the linear integration of the contrast of two features: color and motion. Top-down attention is deliberate and more powerful in directing attention. The weights of the bottom-up and top-down constraints are set to simulate a number of human attention biases, such as: skin color preference, preference for moving stimuli, and gaze following. The operation of the attention module is illustrated in Figure 3.

4.2 Modeling investigations on the development of seriation strategies An important tenet of this thesis research is that it builds on real data from psychological, developmental and neurobiological studies. The starting point is represented by the definition of a developmental benchmark, against which modeling can be compared, and which should meet several criteria: a) it must be grounded in the observation of infants' behavior in a complex social scenario, which involves social interaction, imitation, object manipulation, language understanding and production; b) it must permit the characterization of developmental stages in the acquisition of the skills under observation; c) it should be sufficiently realistic, so that it could be replicated experimentally; d) the infants' behavior under observation must be such that they can be modeled and implemented in artificial systems. A good candidate is the seriated nesting cups task (Greenfield et al., 1972).

Figure 4. Sub-assembly strategy used to embed three cups (after Greenfield

et al., 1972). According to Greenfield and colleagues, the series of actions (shown on top of the figure) correspond to a recursive linguistic structure (shown at the bottom of figure).

The seriated nesting cups task, consists in the demonstration of how five cups are seriated using an advanced strategy, followed by a spontaneous imitation phase, during which the child is left free to play with the cups. Greenfield and colleagues reported the existence of three strategies for combining cups of different sizes, at

Page 8: Educating attention in cognitive robots - coneural.orgconeural.org/goga/papers/PhDPlan_2005.pdf · Educating attention in cognitive robots ... by a cognitive model whose ability to

infants aged between 11 and 36 months: (1) the pairing method, when a single cup is placed in/on a second cup; (2) the pot method, when two or more cups are placed in/on another cup; (3) the subassembly method, when a previously constructed structure consisting of two or more cups is moved as a unit in/on another cup or cup structure. Greenfield et al. (1972) put forward the hypothesis of a formal homology between strategies for the construction of cups assemblies and certain grammatical constructions. When a cup �acts upon� another cup to form a new structure, there is a relation of actor-action-acted upon. Such a relation is realized in sentence structures like subject-verb-object. The second and third strategies allow the formation of multiple actor-action-acted-upon sequences, and, as such, would correspond to the usage of more complex sentences.

The first year of my PhD focused on investigating with modeling and experimental methods the various types of constraints (i.e., social, computational and maturational) involved in the performance of the seriated nesting cups task. Part of my work consisted in putting forward a developmental model which can account for the systematic differences observed in the infants� strategies for combining objects. In particular, we considered the effects of basic categorization, joint attention, mnezic capacities, and spatiotemporal object representation on the robot�s ability to compose manipulation steps. The outcome of these computer simulation investigations is represented by a developmentally constrained model able to reproduce the limitations observed in the human infants� usage of pairing and pot strategies (See Figure 5).

Figure 5. Simulation of the pairing and pot seriation strategies with a pair of child-caretaker agents. a) The demonstration of the seriated nesting cups task, in left, and the imitation setting, in right. b) The imitator forms a pair of not seriated cups,

Page 9: Educating attention in cognitive robots - coneural.orgconeural.org/goga/papers/PhDPlan_2005.pdf · Educating attention in cognitive robots ... by a cognitive model whose ability to

starting with the most color salient cup. c) The imitator forms a pair of seriated cups, starting with the smallest cup. d) The imitator forms a pot with 3 cups. Various behaviors were simulated based on different settings of the system� constraints.

Figure 5a shows the dynamic simulation of a child-caretaker pair of

humanoid robots using the simulation environment Xanim (Schaal, 2001). During the demonstration of the task, the learner builds an internal model of the task, based on its capacity to detect the goal of the seriation activity. For the reproduction of the task, the internal model activates the imitation goal and the actions taken by the agent result from a process of multiple constraints satisfaction. A variety of imitative behaviors was replicated through the probabilistic satisfaction of three types of constraints: conservation, size and saliency (see Figure 5b-d). 4.3 Behavioral studies on educating attention with human infants Human parents communicate with their baby from the minute they are born. There are a number of specific things that adults do in order to structure the information and to adapt the complexity of the task to the infant�s developmental level: engage in sensorimotor and vocal imitation episodes (Nadel and Butterworth, 1999), follow the baby�s lead or direct her attention during joint attention episodes (Tomasello, 1988), use specific patterns of infant addressed speech, known as motherese (ref), provide perceptual structure synchronously with the referring words (Zukow-Goldring, 2000). Moreover, it has been shown that the parental interaction style is predictive for the success of early language acquisition. When parents follow the infant's focus of attention when providing language, infants' vocabulary acquisition has been shown to proceed at a faster pace (Tomasello and Farrar, 1986). Embedding gestures and speech in the same synchronous, dynamic message contributes more robustly to word comprehension and reaching of consensus, than a static communication (Zukow-Goldring, 2000).

We were interested to investigate which are the attention strategies used by parents and infants during the seriated nesting cups task. A set of observational studies have been carried out by us on 10 infants aged between 14 and 38 months of age, Romanian native speakers. Children have been observed together with a caregiver during the demonstration and imitation of the seriated nesting cups task. The original experimental setting has been modified as follows. During the imitation condition, the parent could interact with the child in a limited manner (i.e., provide feedback, but could not explain the task). During a guiding condition, the parent was instructed to choose any means by which she could assist the child in imitating her, by using the most advanced strategy. Preliminary analysis of the data indicates that in the presence of the parent�s perceptual and linguistic guidance, children can bootstrap to a more advanced strategy. This data is further analyzed to identify the attention education strategies used by parents to support the child in identifying the goal of the imitation and performing the task. The preliminary classification of the strategies identified is given in Figure 6.

An important outcome of this study is represented by the collection and transcription of the linguistic interaction between the caregivers and infants. Relevant linguistic sequences from the child-caretaker interaction will be selected and used as training input for the action-language computational model.

Page 10: Educating attention in cognitive robots - coneural.orgconeural.org/goga/papers/PhDPlan_2005.pdf · Educating attention in cognitive robots ... by a cognitive model whose ability to

Figure 6. Child-caregiver interaction during the demonstration and the guided

imitation of the seriate nesting cups task. The child is 16 month old. The video and audio records have been transcribed using the Codes for the Human Analysis of Transcripts system (CHAT, MacWhinney, 1994). The coding scheme was developed based on the Inventory of Communicative Acts – Abridged (Ninio et al, 1994).

5. Future work If we are to build a model that can produce complex combinations of action and linguistic steps, we need a computational framework capable to translate a hierarchical structure into a temporally ordered structure of actions. Our approach follows a recent trend in language processing that applies parallel constraint satisfaction (i.e., phonological, syntactic, semantic, and contextual constraints) to the understanding and generation of meaningful speech in humans and robots (Seidenberg and MacDonald, 1999; Siskind, 1998; Roy, 2002).

Preliminary results showed that the constraints satisfaction framework can be used to model the re-assembling of goal-directed behavior during the imitation of the seriated nesting cups task. What is still missing is the formal means for the integration of symbolic constraints into the sub-symbolic cognitive architecture. The envisaged solution is a hybrid model, which can preserve the advantages of the decompositional, grounded approach, within the global constraints satisfaction framework (part 1 in the time planning).

In order to integrate the sub-symbolic architecture within the hybrid system, the model will be revised and eventually simplified to support scaling of both the size and complexity of the behavioral repertoire. In designing the integrated action-

Page 11: Educating attention in cognitive robots - coneural.orgconeural.org/goga/papers/PhDPlan_2005.pdf · Educating attention in cognitive robots ... by a cognitive model whose ability to

language model, we shall consider the following objectives: a) the agent should be capable to learn simultaneously from demonstration and instruction; b) the agent should be capable to behave autonomously; c) the agent should be able to integrate the feedback provided by the teacher and to modify its internal model accordingly. Part 2 of the time planning is allocated to the research of these issues and the formulation of a conceptual framework.

We intend to extend the action system currently developed for the imitation of the seriated nesting cups task towards the integration with a language system capable to learn from instruction. We envisage the existence of two stages in the development of the linguistic function. During the first acquisition stage, the model will be trained with simple descriptive sentences, provided by a teacher which will synchronize the presentation of the perceptual structure (i.e., pointing to the referent) with the referring sentence. The agent will develop a basic lexicon grounded on the sensorimotor features of its perceptors and actuators (part 3 of time planning). The implementation is planned to be tested on a real robot (part 4 of time planning).

Our approach, strongly bound in developmental and experimental data, has the major advantage that following a similar developmental path with the human infant, the robot can understand and predict human behavior. The disadvantage is represented by the difficulty of modeling even a short transcript of human social interaction. To illustrate the complexity of the guided interaction process, we provide a short transcript of a typical human child-caretaker interaction between a mother and her 14 months old boy, Romanian native speakers. The transcript is made using the CHAT system, where MOT and CHI stand for the mother and child speech tiers, and %act represents a dependent tier for the description of sensorimotor actions.

*MOT: start with the small one. %act: MOT hands cup 1 to CHI. *CHI: 0 [%no speech]. %act: CHI places cup 1 into cup 5. *MOT: no, we put it into the green one [%cup 2]. %act: MOT points to cup 2. MOT embodies the CHI gesture to place the cup 1 into 2. *MOT: after that you take these two. *CHI: 0. %act: CHI looks at cup 4 and tries to grasp cup 4. *MOT: no, take those I told you. *MOT: take these ones. %act: MOT hands the assembly of cups 1 and 2 to CHI. *MOT: and you place them in the next one. %act: MOT points to the cup 3. *MOT: in this one. %act: MOT pushes cup 3 in front of CHI. *CHI: 0. %act: CHI places cup 1 and 2 into cup 3.

A solution is to gradually increase the complexity of the linguistic input and

of the perceptual structures offered by the teacher to the infant robot. Scaffolding of the inputs can be implemented by training the model sequentially, starting with data containing speech addressed to youngest infants and ending with the older infants (part 5 of time planning). The capacity of the model to replicate the developmental

Page 12: Educating attention in cognitive robots - coneural.orgconeural.org/goga/papers/PhDPlan_2005.pdf · Educating attention in cognitive robots ... by a cognitive model whose ability to

stages in language acquisition in human infants will be put to test (part 6 of time planning).

When the agent possesses a basic knowledge of words meaning, language can be employed to speed up learning and to facilitate understanding of a situation. By using instruction along with providing perceptual structure, the teacher can drive the learner attention to the important bits of information, those that are meaningful for the understanding of the intentions of the demonstrator in a given situation. Relevant attention-directing sequences from the human caretaker-child interaction will be used by the demonstrator agent to guide the learner�s attention and to provide feedback (part 7 of time planning). The attention education model will be implemented on Robota, and its capacity to orient attention and understand the teacher�s instruction will be tested (part 8 of time planning).

Finally, we aim to provide a computational framework for how previously developed structures for goal-directed imitation and object manipulation can support the acquisition of language syntax. We intend to investigate dynamic computational means for cooperation and competition between the neural patterns established in the action network and those developing in the language network (part 9 of time planning). 6. Research time planning

1 2 3 4 5 6 7; 8 9 10

1. Design of the hybrid model Oct05-Dec05 (3 months)

2. Design of the interactive learning framework Feb06�Mar06;Oct06 (3 months)

3. Lexical acquisition modeling on simulated agents Apr06-Jun06 (3 months)

4. Grounding of words meaning on Robota Jul06-Sep06 (3 months)-EPFL

5. Incremental training of the model for the extraction of syntactical structures

Nov06-Feb07 (4 months)

6. Reproduction of the developmental stages of seriation and linguistic skills.

Mar07-Jun07 (4 months)

7. Training of the model using complex sensorimotor linguistic sequences from human child-caretaker interactions

Jul07-Oct07 (4months)

8. Implementation of the education attention model on Robota

Aug07-Sep07 (2 months) �EPFL

9. Formalization of re-usage means Nov07-Feb08 (3 months)

10. Assessment of the attention education model and thesis report.

Mar08-Aug08 (6 months)

Oct 2005

April 2006

Oct 2006

April 2007

Oct 2007

April 2007

Aug2008

Page 13: Educating attention in cognitive robots - coneural.orgconeural.org/goga/papers/PhDPlan_2005.pdf · Educating attention in cognitive robots ... by a cognitive model whose ability to

References Arbib, M.A., (2004), From Monkey-like Action Recognition to Human Language: An

Evolutionary Framework for Neurolinguistics. To be published in Behavioral and Brain Sciences (in press), Cambridge University Press.

Baldwin, D A (1995). Understanding the link between joint attention and language. In C Moore and P J Dunham (Eds) Joint Attention: Its origins and role in development. Lawrence Erlbaum Associates

Billard, A., and Hayes, G. (1999) DRAMA, a connectionist architecture for control and learning in autonomous robots, Adaptive Behavior Journal, 7(1): 35-64, January.

Billard, A., and Dautenhahn, K., (2000) Experiments in social robotics: grounding and use of communication in autonomous agents, Adaptive Behavior, vol. 7: 3/4.

Billard, A., and Mataric, M., (2001), Learning human arm movements by imitation: Evaluation of a biologically-inspired connectionist architecture, Robotics & Autonomous Systems, 941: 1-16.

Billard, A., (2002) Imitation: a means to enhance learning of a synthetic proto-language in an autonomous robot, in Imitation in Animals and Artifacts, (K. Dautenhahn and C. L. Nehaniv , Eds.), MIT Press, pp.281-311.

Bailey, D. R., (1997), When Push Comes to Shove: A Computational Model of the Role of Motor Control in the Acquisition of Action Verbs, Ph.D. thesis, University of California at Berkeley, Berkeley, CA.

Byrne R. W., Russon A. (1998). Learning by imitation: A hierarchical approach. Behavioural andBrain Sciences, vol. 21, pp. 667-721.

Breazeal, C., (2003), Towards sociable robots, T. Fong (ed), Robotics and Autonomous Systems, 42(3-4), pp. 167-175.

Breazeal, C., Hoffman, G., and Lockerd, A., (2004), Teaching and Working with Robots as a Collaboration, submitted to Autonomous Agents and Multi-Agent Systems.

Bruner, J. (1977), Early social interaction and language acquisition. In R. Schaffer (Ed.), Studies in mother-infant interaction (pp. 271-289). NY: Academic.

Calinon, S. and Billard, A. (2005), Recognition and Reproduction of Gestures using a Probabilistic Framework combining PCA, ICA and HMM. In Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, 2005.

Carbonell, J. G., Knoblock, C. A., Minton, S. (1991). PRODIGY: An Integrated Architecture for Prodigy. In K. VanLehn (ed.), Architectures for Intelligence, pp. 241-278, Lawrence Erlbaum Associates, Hillsdale, N.J.

Demiris, J., Hayes, G.M. (2001), Imitation as a dual-route process featuring predictive and learning components: a biologically-plausible computational model, Imitation in Animals and Artifacts, K. Dautenhahn, C. Nehaniv (Eds.), MIT Press.

Dominey, P.F, (2006) Learning To Talk About Events From Narrated Video in the Construction Grammar Framework, In Artificial Intelligence Special Issue on Connecting Language to the World.

Dominey, P.F., Hoen M., Blanc J.M., Lelekov-Boissard (2002) Neurological basis of language and sequential cognition: Evidence from simulation, aphasia and erp studies, Brain and Language.

Frezza-Buet, H., and Alexandre, F., (2002), From a biological to a computational model for the autonomous behavior of an animat, Information Sciences, Vol.144, 1-4, pp. 1-43.

Page 14: Educating attention in cognitive robots - coneural.orgconeural.org/goga/papers/PhDPlan_2005.pdf · Educating attention in cognitive robots ... by a cognitive model whose ability to

Goga, I., Billard, A. (2005), Development of goal-directed imitation, object manipulation and language in humans and robots. In M. A. Arbib (ed.), Action to Language via the Mirror Neuron System, Cambridge University Press (in press).

Greenfield, P., Nelson, K., and Saltzman, E., 1972, The development of rulebound strategies for manipulating seriated cups: a parallel between action and grammar, Cognitive psychology, 3:291�310.

Greenfield, P. (1991), Language, tool and brain: The ontogeny and phylogeny of hierarchically organized sequential behavior, Behav. Brain Sci., 14:531-595.

Harnad, S. (1990) The Symbol Grounding Problem. Physica D: Nonlinear Phenomena, 42:335--346.

Huffman, S., Miller, C., Laird, J. (1993). Learning for instruction: a knowledge-level capability within a unified theory of cognition. Proceedings of the 15th Annual Meeting of the Cognitive Science Society. Boulder, Colorado.

Kozima, H., and Yano, H., (2001) A robot that learns to communicate with human caregivers, Proc. First Workshop on Epigenetic Robotics, Lund, Sweden, 2001.

Kuniyoshi, Y., Nagakubo, A., Berthouze, L., Cheng, G., (2000) Embodiment, Emergence and Intentionality � A Robotic Point of View, Proc. 6th Symp. on Sohatsu System, Toyama, Japan, 2000.

Itti, L., and Koch, C., 2001, Computational modeling of visual attention, Nature Reviews Neuroscience, 2(3), 194-203.

MacWhinney, B., 2000, The CHILDES Project (3rd ed.). Volume I: Tools for Analyzing Talk: Transcription Format and Programs, Mahwah, NJ: Lawrence Erlbaum.

Michalski, R. (1987) Learning strategies and automated knowledge acquisition: an overview, in Computational models of learning, Ed. Leonard Bolc., Springer-Verlag.

Metta, G., Sandini, G., and Konczak, J. (1999), A developmental approach to visually-guided reaching in artificial systems, Neural Networks, 12.

Nagai, Y., Hosoda, K., Asada, M. (2003). How does an infant acquire the ability of joint attention?: A constructive aproach. 3rd Int. Workshop on Epigenetic Robotics (EpiRob '03), pp. 91-98.

Nadel, J. and Butterworth, G., (Eds.) (1999). Imitation in infancy. Cambridge: Cambridge University Press.

Ninio, A, Snow, C. E., Pan, B. A., and Rollins, P. R. (1994). Classifying communicative acts in children's interactions. Journal of Communication Disorders, 27, 158-187.

Regier, T., (1995) A Model of the Human Capacity for Categorizing Spatial Relations, Cognitive Linguistics, 6(1):63-88.

Reilly, R.G. (2001) Collaborative cell assemblies: Building blocks of cortical computation, in Emergent neural computational architectures based on neuroscience, S. Wermter, J. Austin, D. Willshaw (Eds.), Heidelberg, Germany: Springer.

Reilly, R.G. (2002). The relationship between object manipulation and language development in Broca's area: A connectionist simulation of Greenfield's hypothesis. Behavioral and Brain Sciences, 25, 145-153.

Roy, R (2002). Learning Words and Syntax for a Visual Description Task. Computer Speech and Language.

Page 15: Educating attention in cognitive robots - coneural.orgconeural.org/goga/papers/PhDPlan_2005.pdf · Educating attention in cognitive robots ... by a cognitive model whose ability to

Scassellati, B., (1998) Building behaviors developmentally: a new formalism, Proc. AAAI Spring Symp. on Integrating Robotics Research, 1998.

Schaal, S., (2001), The SL simulation and real-time control software package, Technical Report Computer Science Tech Report, University of Southern California.

Searle, J. R. (1980), Minds, Brains, and programs, The Behavioral and Brain Sciences 3, 417-457.

Seidenberg, M.S., and MacDonald, M.C., (1999), A probabilistic constraints approach to language acquisition and processing, Cognitive Science, 23(4): 569-588.

Siskind, J. M. (1993) Lexical acquisition as constraint satisfaction. IRCS-93-41, University of Pennsylvania.

Siskind, J., (2001), Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic, Artificial Intelligence Review, 15:31-90.

Steels, L. (1996) Emergent adaptive lexicons. Fourth International Conference on Simulation of Adaptive Behavior, pp. 562�567,MIT.

Tomasello, M. (1988). The role of joint attentional processes in early language development. Language-Sciences, 10, 69-88.

Tomasello, M. & Farrar, J. (1986). Joint attention and early language. Child Development, 57, 1454-1463.

Tomasello, M. & Todd, J. (1983). Joint attention and lexical acquisition style. First Language, 4, 197-212.

Weng, J., McClelland, J., Pentland, A., Sporns, O., Stockman, I., Sur, M., Thelen, E., (2000) Autonomous mental development by robots and animals, Science, 291:5504, 599 � 600.

Treister-Goren, A., and Hutchens, J.L. (2004) Creating AI: A unique interplay between the development of learning algorithms and their education.

Wermter S., Elshaw M. (2003), Learning Robot Actions Based on Self-organising Language Memory. Neural Networks, Vol. 16, No. 5-6, pp. 691-699.

Zlatev, J., and Balkenius, C., (2001) Introduction: Why epigenetic robotics?, Proc. First Workshop on Epigenetic Robotics, Lund, Cognitive Studies 85, 2001.

Zukow-Goldring, P. (2000). Perceiving referring actions: Latino and Euro-American caregivers and infants comprehending speech. In K. L. Nelson, A. Aksu-Koc, & C. Johnson (Eds.), Children's Language, Vol. 10. Hillsdale NJ: Erlbaum.=20