sensor based physical interaction for embodied playful learning games
Post on 21-Mar-2016
224 Views
Preview:
DESCRIPTION
TRANSCRIPT
July 2012. Amsterdam
THESIS SUPERVISORS
Prof. Dr. Anton Eliens (Dep. Of Computer Science, VU Amsterdam) Dr. Lora Aroyo (Dep. Of Computer Science, VU Amsterdam) Keimpe de Heer (Internship supervisor, Creative Learning Lab, Waag Society)
Sensor based physical interaction for embodied playful learning games Master Thesis Project Nikolaos Poulios
MSc. Computer Science / Multimedia
Vrije Universiteit Amsterdam
Creative Learning Lab – Waag Society
n.poulios@student.vu.nl Stud. No.: 2001527
2
3
4
Sensor based physical interaction for embodied playful learning games
Nikolaos Poulios
Master Thesis Project for the degree of MSc Computer Science, specialty Multimedia, at the Vrije Universiteit Amsterdam.
July 2012
Abstract
This thesis explores the use of modern sensor technologies for physical interaction on educational games and interactive spaces. More specifically the thesis studies the potential effect of motion capture and wearable body sensors on educational interactive games, on two aspects: i) on the involvement of human body and motion in the process of learning, and recall of knowledge (embodied learning), ii) on assisting the development of basic social emotional competencies, through the enhanced social affordances of embodied games. After building a theoretical framework on these two aspects, based on previous research, the thesis presents a range of state of the art commercial and research technologies of real time motion capture, and biofeedback/emotion recognition systems, which could be utilized in educational interactive spaces, along with insights from tested selected technologies, and tools/platforms for the development of sensor based interactive systems. The thesis continues by proposing a generic distributed system architecture for such spaces, separating a sensor device level from an application level. This architecture is then put into test on the implementation of a prototype, virtual board game, featuring a motion capture sensor (Microsoft Kinect), and two body sensors (EEG, ECG). The thesis concludes with results from the evaluation of the prototype followed by general conclusions and ideas for further development.
Thesis Supervisors
Prof. Dr. Anton Eliens (Department of Computer Science, VU Amsterdam)
Dr. Lora Aroyo (Department of Computer Science, VU Amsterdam)
Keimpe de Heer (Internship supervisor, Creative Learning Lab, Waag Society)
5
Preface
This thesis project is the product of almost one year of personal work, driven by a strong interest, developed during my master studies, in sensor based interactive systems, and media art installations. Inspired by the Embodied Playful Learning Theater project description of the Waag Society institute, focusing on a multi-‐sensor installation for educational games that would assist children to develop their social emotional competences, I started a research internship there. The Waag Society offered a very good environment for field experimentations, and the project provided an interesting background for academic research. During my internship, I made a literature study around educational games, and how sensor based physical interaction could contribute the enhancement of an embodied gaming experience, studying also available technologies in motion capture, and bio-‐feedback sensors. During this 6 months period I had the chance to test some selected sensor technologies, and write small code prototypes using them. After the end of my internship, I focused on putting those pieces together in a prototype game, that would make use of multiple sensors and blend virtual game mechanics with tradition forms of children games, result of which was the NumHop game presented in this document.
This study tries to balance between theoretical and applied research, attempting a first systematic approach towards multi-‐sensor based, physical interaction games and systems in general. I think that the results of this research provide very encouraging signs towards the big room left for further study.
N. Poulios
6
Acknowledgments
This document is the result of research made for my master thesis project and an internship at the Creative Learning Lab, the educational department of the Waag Society institute for art, science and technology, in Amsterdam, between June and December of 2011. Creative Learning Lab is focused on innovative ways of learning, with the use of creative technology. Aim of the internship was to conduct preliminary research for the Embodied Playful Learning Theatre (EPLT) project. Based on a physical interactive installation, EPLT is meant to be an open platform for research, development, exploitation, testing and support of a range of embodied and multisensory technologies, such as motion tracking and other body sensing technologies, focusing on the development of serious educational games for training of social emotional competencies, by safely exposing learners to a virtually simulated conflict situation in which wellbeing plays a crucial role. EPLT is part of the institute’s involvement in the COMMIT P4, “virtual worlds for well-‐being” project*. The COMMIT program is the largest Dutch public-‐private research programme, bringing together leading researchers in search engines, parallel computing, databases, interaction in context, embedded systems and knowledge technology, aiming to broaden and enforce the Dutch knowledge infrastructure in ICT and to better position Dutch companies in international competition by connection the best scientists to high-‐tech companies.
I would like to thank first of all my supervisors, Prof. Dr. Anton Eliens, and Dr. Lora Aroyo, for their guidance and support, my internship supervisor Keimpe de Heer, director of the Creative Learning Lab, for his inspiration, help, and support, along with all the people of the Waag Society institute, Heidi J. Boisvert, artist in residence at the Waag Society, during the period I started my internship, for her help. I would also like to thank Marco Otte, technical director of CAMeRA@VU, center for advanced media research of the VU Amsterdam, for his help and consultancy, and Tobias Ruf, from the Electronic Imaging Department of the Fraunhofer Institute, for providing very kindly their SHORE (Sophisticated High-‐speed Object Recognition Engine) SDK, for our tests on facial expressions analysis.
* http://waag.org/nl/project/commit, http://www.commit-‐nl.nl/projects/virtual-‐worlds-‐for-‐well-‐being
7
Table of Contents
Abstract 4 Preface 5 Acknowledgments 6 Chapter 1: Introduction 9 1.1 Motivations 10 1.2 Problem, Research Questions and approach 11 1.3 Summary of Contributions 11 1.4 Outline 12
Chapter 2: Interactive Games and Embodied Learning 14 2.1 Games and Conceptual Engagement 14 2.2 Social-emotional skills and interactions 15 2.3 The role of physical motion interaction 18 2.4 The role of bio-feedback sensors 20
Chapter 3: Sensing motions 22 3.1 Basic motion sensors 22 3.1.1 Sensing forces 22 3.1.2 Detecting motion 23 3.1.3 Measuring distance 23
3.2 Motion Capture and tracking systems 24 3.2.1 Optical Systems 24 3.2.2 Non-‐optical systems 27 3.2.3 Motion capture libraries 28
3.3 Motion sense in interaction 28 Hand Tracking 29 Head/Face Tracking 29 Eye Tracking 29 Nintendo Wii Remote 30 Blobo 31 Floor boards 31 Sony PlayStation Move 31 Microsoft Kinect 32 Panasonic D-‐Imager 33
3.4 Comparison of motion capture systems for the EPLT installation 33 Chapter 4: Sensing emotions 35 4.1 Speech analysis 37 4.2 Facial expressions 37 4.3 Body movement/postures 38 4.4 Pupil size 39 4.5 Bio-feedback sensors 39 4.6 Brain Computer Interfaces (BCI) 40 4.7 Developing Tools for Multimodal Biofeedback 42 4.8 Data representation of emotions 43 4.9 Biofeedback Interactions. Thoughts and insights 44
8
Chapter 5: Hardware and software platforms for multi-sensor interactive spaces 49 5.1 Sensor Hardware Platforms 49 Arduino 49 .Net Gadgeteer 50 Phidgets 50 Shimmer 50 I-‐CubeX 51
5.2 Interactive software development platforms 51 Visual Programming Languages 52 Working with sensors 53
Chapter 6: A generic architecture for multi-sensory interactive systems 54 6.1 Architecture Description 54 6.2 Use Case 56
Chapter 7: Prototyping a virtual board game with physical interaction 57 7.1 Introduction 57 7.2 Preliminary studies 58 7.3 NumHopII - The Game 59 7.4 Architecture Overview 61 7.5 Main components overview 63 SensorOSCTransmitter (C++ -‐ cinder) 63 NumHopII (C# -‐ Unity3D) 64
Chapter 8: Evaluation results and conclusions 66 8.1 Prototype Game Evaluation 66 Microsoft Kinect 67 Neurosky Mindwave 69 Zephyr HxM 69
8.2 General conclusions and further development 70 8.3 Summary of research results 71
Appendix I 74 NumHop Game Evaluation Questionnaire 74
Bibliography 76 References 76
9
Chapter 1: Introduction
The use of computer games in education has been an active field of academic research for the last twenty years, providing considerable evidence to support the positive effects of the use of games on the learning outcomes of students, by increasing their motivation, stimulating their engagement, and by helping students to understand complex concepts by applying them on problem-‐solving tasks, in an explorative environment. Nowadays the image of computer laboratories in schools has become established, aiming to familiarize children with the use of modern technology, and providing an alternative medium for educational material.
On the other hand, in the consumer and home entertainment industries, recent innovations on sensor and software technology, have lead to the development of a constantly growing range of products focused on entertainment and personal wellbeing. Major game consoles like the Nintendo Wii, Microsoft Xbox, and Sony Playstation, have introduced peripherals featuring motion sensors and computer vision algorithms, designed for games in which the gamer interacts in a physical way, using body motion, instead of sitting on a couch using a conventional controller, and which successfully promote physical exercise in an entertaining way. A lot of other products have been introduced based on wearable sensors monitoring body signals during sports training or daily life activities, to be used by athletes or general consumers, in order to monitor and analyze their physical state.
Based on the idea of embodied cognition, and the assumption that the participation of body into the educational process, plays an important role on the learning outcomes, and personal development of children, this thesis is a study on state of the art sensor technologies and their application on the designing of interactive spaces, focused on a broader sense of educational games. The Embodied Playful Learning Theater (EPLT) is a project of the Waag Society institute in Amsterdam, for the implementation of such an interactive space. Combining advanced computer vision, motion and wearable body sensors technology, with real time computer graphics projection, active sound and lighting systems, the EPLT is meant to be a highly immersive environment, providing an open platform for research and development of applications and games, that are designed to interact based on body motion, physical and mental state of the user. Target of these applications is to offer a playful and learning experience that will help children to develop their personal, social-‐emotional skills.
As preliminary research for the EPLT project, this thesis aims to build a theoretical and technological background for the development of the platform, by reviewing findings of previous research on educational games, interactive
10
storytelling, physical Human Computer Interaction, modern hardware and software technologies for the purpose of the project, and a set of insights gained in the design and evaluation process of interaction concept prototypes.
1.1 Motivations
Motivating young students in the educational process is a continuous and diachronic challenge for the educators, who try to make learning more active and engaging. At the same time, modern pedagogy places the development of social-‐emotional competences at its core, creating a growing need in education for tools and instruments to assess and support these skills. Central to the social emotional competence development of learners is the individual in relation to his or her social environment using all possible expressive forms. For learners this includes social, pedagogical, didactical and psychological guidance in the development of competencies for societal participation and group dynamics.
Although current computer based educational games may feature all the qualities to be engaging, like rich computer graphics and story driven plot, recent technological innovations, allow us to consider a dimension previously ignored in gaming, that of the human body. The human body can be seen as part of the human cognition, and a medium of self-‐expression and interaction with the environment and other people, thus its participation is of key importance to the development of social-‐emotional competences of learners. On the other hand, the use of body sensors provide us a way to notice and measure in real time, the reflection of our actions, or stimuli of the surrounding environment, to our body, physical and mental condition, helping to understand ourselves and others better.
The conjunction of multi-‐sensor technologies with virtual worlds and the dynamics of game and play provide excellent possibilities for playful learning, and to train and assess social emotional competences in an immersive environment in which learner are engaged in problem based simulations and conflict situations, in the realms of a safe and confined space.
Apart from educational applications, which is the main research topic of this study, this thesis is motivated by a general interest in fusing art, electronic technology, immersive interactive environments, and the design of the EPLT as an open platform for research, experimentation and support of multi sensor technologies in the field of art and science. Since the work of pioneer composers in music like Karlheinz Stockhausen, Iannis Xenakis and John Cage, electronic and computer technology has become essential part of music and what is commonly known today as computer music. Music has many effects related to wellbeing (e.g. relaxation, stimulation, expressing emotion and defining identity), because music is inherently meaningful to human beings. When we listen to music, our mind manipulates musical structures (non-‐consciously extracted) and this manipulation generates an emotional response. Sensor technology can thus contribute to the creation of computer music and music cognition research. Similarly, technology is increasingly being used in visual arts, from the deus ex
11
machina of the Greek tragedies, to modern computerized theater stages, dance performances like the works of Merce Cunningham, where dancing bodies blend on stage with real time computer graphics, and numerous interactive installations emerging to a new form of art. (See “Digital Performance” in bibliography for a history of technology in performing arts)
1.2 Problem, Research Questions and approach
The Embodied Playful Learning Theater will be built on top of a physical installation. After defining the interaction technologies suitable for the context of the project, their integration must be examined with respect to their specifications and those of the physical installation. Additional usage of sensors for the monitoring of the player’s condition requires the development of a common framework for the real time collection and process of input data.
The two main research questions of the thesis are:
• Can physical interactions be combined with a virtual environment to enhance a playful gaming experience inside a gaming installation? The research is called to answer this question by exploring existing sensor based human computer interaction technologies and applications, available software development tools for such gaming installations and examine their advantages and limitations.
• What sensor technologies are most applicable for enhancing a playful gaming experience inside the EPLT installation?
The research is called to answer this question by examining, what sensor technologies are more suitable for the context of the games, and which are their characteristics and constrains that need to be considered for their application to the EPLT installation.
The research is based on performing physical interaction domain exploration, with literature and technological study on existing sensor technologies and software, followed by small cycles of design, development and testing of prototype applications using selected sensor technologies.
1.3 Summary of Contributions
The contributions of this thesis are:
• A theoretical background on educational games for the development of social emotional competences of learners, based on multi sensor interactive installations
12
• A review of state of the art motion capture and biofeedback sensor technologies, as well as a review of software to support the development of interactive applications featuring these sensors
• An architectural proposal for multi-‐sensor interactive spaces
• Prototype applications for selected sensor technologies, developed as components of the proposed architecture
• Insights from the evaluation of prototypes, and concept ideas for further development based on the evaluated technologies
1.4 Outline
Chapter 2 aims to build a theoretical background on embodied learning games for social emotional competences development, by studying and discussing findings and ideas from previous academic research, combining the topics of games in education, the effect of motion based interaction in games, and the topic of “affective computing” and emotion recognition technologies in physical human computer interaction.
Chapter 3 focuses on motion capture technologies, starting from basic motion sensors, the chapter continues with a review of currently available commercial sensor systems and research innovations. Chapter 4 is a review of the second technological part under research, that of biofeedback wearable sensors and emotion recognition systems, including multimodal emotion recognition software platforms developed on research institutes, as well as standards for the representation, annotation, storage and transmission of human emotions along emotion aware applications. Chapters 3 and 4 aim to present the range of technologies that could be applied as inputs to an interactive system, summarizing their features and constrains through which someone can judge their suitability during the design process of a specific application built on top of the installation.
Chapter 5 presents some examples of hardware sensor platforms including commercial ready-‐made solutions as well as platforms to built custom made sensors. The use of these platforms provides a common device level for larger scale applications using a large number of sensors, increasing functionality and simplifying the implementation of a network of devices and software. The chapter continues with a presentation of software platforms providing tools and sets of libraries necessary for the development of interactive applications featuring sensors, multimedia inputs and outputs, and network connectivity.
Chapter 6 provides an architectural proposal for multi-‐sensor interactive spaces like the EPLT, featuring an application level, independent from the input and output device level, and a communication layer between the levels and their components.
13
Chapter 7 is documentation of prototype code implementing components of the proposed architecture’s device level, for selected motion capture and biofeedback sensors, and a proof of concept game, demonstrating a version of the proposed architectural design.
The thesis ends with Chapter 8, presenting insights gathered during the evaluation process of the prototype, ideas for further development of the prototype and other applications, and final conclusions.
14
Chapter 2: Interactive Games and Embodied Learning
This chapter discusses the question of how can physical interactions and biofeedback mechanisms can be combined with a virtual environment to enhance a playful gaming experience on an interactive game space. Combining findings of previous research from the topics of game theory, education, social psychology, and human computer interaction (HCI), this chapter aims to build a theoretical framework on physical games designed for interactive spaces, by analyzing how games utilizing motion sensing controllers and body sensors can contribute to engagement, motivation, and social interaction between children, and the effect on the learning outcomes.
2.1 Games and Conceptual Engagement
Educators continuously face the problem of motivating and engaging their students to learn. Main reasons of this problem are believed to be the passive form of tuition in class, and the gap that exists between learning a theory and understanding its practical value. At the same time younger and younger children are becoming immersed in the consumption of media and the early adoption of technology in their homes. According to studies conducted by the Kaiser Family Foundation (Rideout, Foehr, & Roberts, 2010), Sesame Workshop, and others recently synthesized in the Cooney Center’s report Always Connected: The New Digital Media Habits of Young Children preschool and primary grades children typically consume between 4 (for preschoolers) and 7.5 hours (for 8-‐year-‐olds) of media on a typical day. More than half of all children under 5 use some type of electronic learning toy, and watch an average of 3.5 hours of television in an average day. By the time they are 8, more than 70% of all children play video games, and 67% use the Internet on a daily basis. (Gutnick, Robb, Takeuchi, & Kotler, 2011) [1].
Following this tendency, a large number of academic studies during recent years have focused on the design of interactive games for educational purposes, providing considerable evidence to support the positive effectiveness of educational games on a broad range of learning outcomes. Piaget, through his child development theory believes in the development of cognitive structures through action and spontaneous play [2]. According to Piaget, constructivist learning is rooted in experimentation, discovery and play among other factors. Malone and Lepper consider games as intrinsic motivators for learning [3]. Games provide an alternative, more active and experiential method of learning that supplementary with traditional textbooks can help students understand better complex concepts and engage with content within contexts of use [4]. Gee [5] argues that schools provide the manual but not the game, and that any gamer will tell you that reading a manual without playing the game is confusing and
15
unproductive. However, while one is playing the game, the manual can provide an important sense of direction and serves to deepen emergent claims. Game dynamics motivate students to compete in achieving better results and immerse them in problem-‐based simulations where learning becomes a more personal experience. Games featuring rich narrative invite players to inhabit roles and assume identities as they adopt conceptually-‐relevant intentions in a virtual world in which they make choices, develop skills, and experience the impact of their actions as part of a legitimate game role, allowing students to move beyond their classroom identity and become legitimate participants in the game narrative (Barab, Sadler, Heiselt, Hickey and Zuiker, 2007) [6]. Balasubramanian and Wilson (2006) analyzed the findings of numerous studies and found that well-‐designed educational digital games and simulations can help students to obtain critical problem solving and decision making skills, which are necessary for everyday living [7], and Hake [8] examined pre-‐ and post-‐test data for over 6,000 students in introductory physics courses and found significantly improved performance for students in classes with substantial use of interactive-‐engagement methods.
2.2 Social-‐emotional skills and interactions
Besides the conceptual engagement analyzed in the previous section, games can assist on Social and Emotional Learning, offering excellent opportunities for social interactions through which children learn to subordinate desires to social rules, cooperate with others willingly, and engage in socially appropriate behavior—behaviors vital to adjusting well to the demands of school
Social and Emotional Learning can be defined as the process of acquiring the skills to recognize and manage emotions, develop caring and concern for others, establish positive relationships, make responsible decisions, and handle challenging situations effectively. Social and emotional learning is of key importance in the pedagogical role of schools, preparing young children to become active parts of the society. Socially and emotionally balanced children have increased confidence, express and communicate better, form better relationships, take and persist at challenging tasks, and have increased capacity of learning. Dr. Maurice Elias, a leading child psychologist, researcher and expert on social-‐emotional learning from Rutgers University, explains the dangers of omitting social-‐emotional development, and programs from our children's classrooms. He states, "Many of the problems in our schools are the result of social and emotional malfunction and debilitation from which too many children have suffered and continue to bear the consequences. Children in class who are beset by an array of confused or hurtful feelings cannot and will not learn effectively. In the process of civilizing and humanizing our children, the missing piece is, without doubt, social and emotional learning.” [9]
Social and emotional skills can be learned and enhanced at any age, but the earlier a person begins the social emotional learning, the greater the advantages. During pre-‐ and early school years, children begin to understand themselves
16
both as individuals and as part of a social world; they are becoming more autonomous, and their cognitive abilities permit them to see how they will fit into their family and group of friends. According to Raver, “from the last two decades of research, it is unequivocally clear that children’s emotional and behavioral adjustment is important for their chances of early school success”. [10]
Goleman [11] outlines five crucial emotional competencies basic to social and emotional learning:
1. Self and other awareness: understanding and identifying feelings; knowing when one's feelings shift; understanding the difference between thinking, feeling and acting; and understanding that one's actions have consequences in terms of others' feelings.
2. Mood management: handling and managing difficult feelings; controlling impulses; and handling anger constructively.
3. Self-‐motivation: being able to set goals and persevere towards them with optimism and hope, even in the face of setbacks.
4. Empathy: being able to put yourself "in someone else's shoes" both cognitively and affectively; being able to take someone's perspective; being able to show that you care.
5. Management of relationships: making friends, handling friendships; resolving conflicts; cooperating; collaborative learning and other social skills.
Free and guided play are important for fostering social competence and confidence in children, as well as the self-‐regulation necessary for managing their own behavior and emotions. In play children learn how to collaborate and negotiate with others, to take turns, and to manage themselves and others. Barnett and Storm [12] also find that play serves as a means for coping with distress. Interactive games in school give us the ability to create a playful environment for social interactions, and the ability to simulate challenging or conflict situations.
Despite the image of social isolation electronic gaming has for many people, and the concerns and criticism raised against them by teachers, parents, researchers and policymakers, the literature does not provide convincing evidence to this effect. On the contrary there is a number of studies demonstrating that games often elicit beneficial on cognitive skills, but also in affective and social terms (Calvert 2005 [13], Gunter [14]). De Kort and Ijsselsteijn [15], inspired by the realization that gaming is often as much about social interaction, as it is about interaction with the game content, review findings of previous research on the psychological experience of social context effects while playing, discuss contingencies between player, co-‐player(s) and audience and how these are shaped by the physical and media context in which
17
they reside, and the ‘sociality characteristics’ of game settings in terms of co-‐located, mediated, and virtual others. In this paper we find several studies reporting electronic games opportunities for social interaction (e.g. Lazzaro, 2007 [16]) for settings ranging from public interaction (arcades), to semi-‐public (LAN events), to private (living room at home). In these it has been found that people enjoy playing together or watching others play, sharing comments and enjoying the spectacle and the enhancement of emotional experience that comes from a crowd (Jansz & Marteens, 2005 [17]), and that some even argue that it is the social interaction and participation that, to a large extent, explain game enjoyment (Bryce & Rutter, 2003 [18]; Carr et al., 2004 [19]).
When people are playing together, their need to belong is nourished in multiple ways. First, through involvement in a common activity they interact socially, and both the number and quality of social interactions contribute to a person's sense of belonging (Baumeister & Leary, 1995 [20]), resulting in a positive affective state. Second, spending time together makes people aware of them being part of each other's social network or group, which generally also brings about positive emotions. Moreover, (unconscious) processes of empathy and mimicry result in a phenomenon called 'emotional contagion', where one person's affective state spreads to that of a second person who is able to perceive his/her facial expressions (Ramanathan & McGill, 2008 [21]). Hence, when one player is visibly enjoying a game, this emotion potentially crosses over to the other. Lastly, the subsequent congruence of feelings engenders an even stronger sense of belonging, through reinforcement and confirmation (Raghunathan & Corfman, 2006 [22]).
Continuing from the same study of De Kort and Ijsselsteijn, the paper denotes the social context effects on player’s performance, caused by the presence of others. The emotional effects include increased arousal, evaluation apprehension, increased self-‐awareness, self-‐evaluation, and increased goal relevance. The effects on performance are moderated by the ‘sociality characteristics’ [23] of the game setting, by the other person’s role (co-‐actor vs. spectator), relationship and expertise, by performance requirements, and by personal differences. Sociality characteristics are the social affordances of the game content, the gaming interface, and the physical environment in which the game is played. Social affordances include the player’s ability to monitor other players’ actions, performance and emotions, and the opportunities for verbal and non-‐verbal communication.
Finally as the paper suggests, naturally, social settings not only allow for experiences of pride and sociability, but also for their negatively toned counterparts – shame, crowding, and social pressure. Thinking of a learning environment, such as the EPLT, where children play under the guidance of educators, even these negative emotions are important for learning basic social emotional competences such as self and others awareness, mood management, and empathy, as referred above.
18
2.3 The role of physical motion interaction
As discussed in previous sections, children need to be involved in a variety of
activities to learn and develop well cognitively, physically, emotionally and socially. These activities include interaction with each other and adults, moving and exploring, manipulating objects, reading and creating representations, listening (and then reading) books, engaged in pretend play, conversing, and building relationships. This information about children’s needs is the basic reason that early childhood teachers often believe that computers and “screen time” have little place in the early childhood setting; they are correct that technology should not replace these vital experiences of childhood. Rather, technology is most productive in young children’s lives when it enhances children’s engagement in these activities, as well as their reflections about their actions and experiences. The currently prevalent model for educational games in schools is for a single student, or a very small group of students, to work on one computer. This model has limited margins of self-‐expression and socio-‐collaborative interactions. The use of modern physical interaction interfaces on hybrid reality spaces for learning will have a great impact on both cognitive and social-‐emotional engagement of children.
Modern motion capture sensors allow the player to interact with a game using physical movement, map player’s body movement to that of a virtual character, and also to create interactions between virtual and physical objects, by embedding sensors on the last ones. Motion controllers give us the ability to design interactive spaces where physical exercise and social interaction, characteristics of traditional outdoor children games like hopscotch or jump rope, merge with those of modern video games, like rich computer graphics and audio, virtual environments, game dynamics, and interactive story telling.
Murray (Murray 1998) proposes three characteristic values of interactive story experiences: immersion, transformation, and agency. Immersion is the feeling of present in another place and engaged in the action therein. Transformation is the game experience that allows the players to transform themselves into someone else for the duration of the experience. Agency is the satisfying power to take meaningful action and see the results of our decisions and choices. Motion interaction combined with large projection in an interactive space, increases immersion because it gives the player the feeling that he is standing and moving inside the virtual world. Additionally to the feeling of just being present, the player has to follow the action using his body, performing all the necessary actions that the virtual character has to perform, coming to a physical state that the character would have in real action, leading to a more experiential, kinesthetic experience with increased feeling of transformation and agency. Previous research comparing motion controllers with conventional interaction contollers, support this claim, finding higher levels of engagement when the controller supports natural movement (Lindley, Couteur & Bianchi-‐Berthouze 2008)[24]. Another study (Bianchi-‐Berthouze, Kim & Patel)[25] suggests that body movements appear not only to increase players’ engagement but also to modify the way they get engaged. By inducing body movement, the device resulted in a higher sense of engagement in the players and mediated a
19
feeling of presence in the digital world. The players appeared to quickly enter in the role suggested by the game, and started to perform task related motions that were not required or recorded by the game itself. Gaming was no longer only a question of challenge; it was the experience itself that rewarded the players. This supports another factor of engagement, that of fantasy, existing on the description of engagement by Malone [26] and Lazzaro [27].
Whereas analytical aesthetics is preoccupied with separating humans into mind and body, a part for thinking and a part for sensing, pragmatist aesthetics insists on their interdependencies in the aesthetic experience. In a pragmatist perspective, aesthetic experience is closely linked not only to the analytic mind nor solely to the bodily experience; aesthetic experience speaks to both. The role of art and design is “to give a satisfyingly integrated expression to both our bodily and intellectual dimensions” [28]. The sensed is without meaning if de-‐contextualized from the intellectual and vice versa [29]. Multiple research areas support the embodiment of human cognition, that nearly all cognitive processes are deeply rooted and derived from the body’s interaction with its physical environment (Dourish 2001 [30], Wilson 2002 [31]). Several theorists (Barsalou, 2008 [32]; Glenberg & Kaschak, 2002[33]) base this premise on research regarding mirror neurons (Rizzolatti & Craighero, 2004 [34]. Located in the premotor cortex, mirror neurons are activated both when perceiving another’s actions and when producing actions oneself. These neurons are hypothesized to be integral in understanding and imitating the actions of others. The fact that the very same cells are involved in both action and perception suggests that activating potential actions may be an automatic consequence of perception. Starting from highlighting the importance of the coupling of motor and perceptual processes for the interaction with the environment, and arguing that this might also be important for mental representation of the world, Hostetter and Alibali [35] study how people use their bodies (i.e. gestures) to express knowledge, supporting one of the claims of embodied cognition’s proponents, according to which offline cognition (i.e., cognition that occurs in the absence of relevant environmental input) is perceptually and mechanically based. From this perspective, the ability to represent and manipulate information that is not currently perceptually present is accomplished through the activation of sensorimotor processes. Based on the theory of embodied cognition, Johnson-‐Glenberg, Birchfield et al., 2010 [36] suggest the idea of embodied learning according to which learning via movement activates additional modalities (and sensorimotor systems) for crisper and more stable representations of information. These crisper representations, with more modal associative overlap, will be more easily recalled. Better retrieval leads to better performance on assessment measures. Findings from studies conducted on SMALLab, a learning environment based on educational interactive material and physical interaction, support that learning in the embodied interactive environment result in greater knowledge gains over time compared to regular classroom instruction.
Another important point, highlighted by all studies on motion based controllers, is that controllers that allow natural movement have the potential to offer greater affordances for social interaction [15][24][25][36]. Going back to social-‐emotional learning, the previous section discussed how the social context
20
effect on player’s performance depends on the social affordances of the game setting, including the game controller, the opportunities for verbal and not verbal communication, and the ability of spectators to monitor player’s actions, emotions, and performance. Aligned with the view of embodied cognition, emotions cannot be seen solely as a mental state but also a physical, bodily, state. Emotions can be generated through imagination without physical interaction, but they can also be generated from body movements (Erkman 1972)[37]. Based on that body postures and motion, designed to interact with the game, become another modality to stimulate player’s emotions through the game. Additionally, the player has the freedom to express her emotions and communicate with others using her whole body and motion. At the same time all the in-‐game action becomes visible to all spectators who can monitor players performance, physical effort invested, and emotions, modifying player’s behaviour and increasing her evaluation apprehension. Body interaction between the game and the player, and between the player and co-‐players or spectators increases self and others awareness, and gives the ability to train mood management.
2.4 The role of bio-‐feedback sensors
Apart from visible expressions of emotions like speech, facial expressions, body posture and motion, researchers have been studying physical responses of the human body during generation of emotions. Tiny electrical charges, sweat, heat flux, and heartbeat have been measured and studied using wearable body sensors and have been related to emotions (for a review see Lisetti 2004 [38]). Emotions have also been a research topic on HCI. Picard (1995) [39] first coined the term “Affective Computing”, describing interactive systems that have the ability to interpret the emotional state of users and adapt their behaviour to them, simulating human empathy. Although several researches have followed since then, there has not been yet any commercial application taking advantage of emotion recognition capabilities. Given the complexity of the processes behind the generation and expression of emotions, making them very difficult to classify, a lot of researchers avoid the term “emotion recognition”, preferring that of “biofeedback mechanisms”, criticizing what Boehner et al. [40] call “Informational approach to emotions”, in which emotions are represented as clear, discrete states, in a machine-‐readable format. Boehner suggests an alternative view called “Interactional approach”, where instead of trying to develop systems to recognize human emotions, it focuses on helping humans understand, experience, and express their emotions through technology.
Considering both the informational and the interactional approach to emotions, the application of biofeedback sensors will contribute to the experience of the EPLT in various ways, and provide the infrastructure for further research on the relationship between bio signals and emotions, and affective interaction. First of all body sensors can provide information about the physical and mental state of the player during the game, contributing to self-‐awareness. Biofeedback gives also spectators a measurable way to monitor player’s performance, and physical effort, giving them an augmented view of the
21
player’s state combining externally visible, and internal modalities. Transparency of the physical effort invested, using for example heart beat as a game score factor, can entice participants to compare their energy expenditure over time and with others, fostering competition that motivates them to invest even more effort. The same information can be used to adjust the game challenge based on player’s state, which will contribute to engagement. [41]
The application of biofeedback mechanisms in games can assist in learning mood management techniques, similar to techniques used in professional sports training. Biathlon for example, which combines cross-‐country skiing with rifle shooting, requires special techniques from the athletes to calm down and control their breathing when they arrive to the shooting range, after a very demanding physical effort, and with a very high heart beat rate. Crews and Landers (1993)[42] identified electroencephalographic signal (EEG) measures of intentional patterns prior to successful golf putts. Pope and Stephens (2011)[43] describe how the concept of physiological modulation of operator input, evolved from a physiologically-‐adaptive simulator system that was developed in National Aeronautics and Space Administration (NASA) flight deck research. In this system, EEG signals of pilots controlled the level of automation in a simulator flight deck. This "closed-‐loop" testing setup was used to determine what level of automation kept pilots best engaged in the flight task. It was soon realized that, given enough practice, pilots could probably turn the testing system into a training system; that is, they would learn to control their EEG to set the level of automation where they preferred. This becomes essentially an EEG biofeedback training situation. In a similar way that games based on motion sensing controllers reward players for imitating a skilled performer’s overt motor behavior, biofeedback mechanisms can additionally challenge the player to reproduce the expert performer’s emotional and cognitive state by setting as a target the psycho-‐physiological responses exhibited by the expert in the real-‐world situation.
Biofeedback sensors can be used to develop virtual actors demonstrating basic artificial emotional intelligence. These virtual actors can for example motivate players and reward physical effort, or help them to calm down. In story driven interactive games, intelligent virtual actors can enable emotion recognition mechanisms in certain points of the story, asking players to act emotions or behaviours, or in order to perceive players’ reactions to game stimuli and trigger virtual actor’s behaviours accordingly, e.g. simulating empathy. Elements like these would increase the engagement and enhance the experience of an interactive story where the player finds herself in an immersive world, inhabited by personality-‐rich, robustly interactive characters.
22
Chapter 3: Sensing motions
This chapter is a review of motion sensing systems. Starting from fundamental motion sensors, the chapter continues presenting state of the art commercial and innovative research systems for motion capture, and motion sensing game controllers, introduced during recent years for game consoles. The chapter reviews main characteristics and functional principles behind those systems, in order to come to conclusions about their suitability for an interactive learning space.
3.1 Basic motion sensors
This part of the document is a short presentation of fundamental motion sensors, used in applications that are studied further in the document. These sensors are basic electronics, with a very particular function, translating changes in one form of energy to changes in electrical energy. All the presented sensors are already around us for quite a while now, in every day systems like: automatic sliding doors and lights, alarm systems, cars, and various industrial control systems. During recent years the progress of technology has reduced their size and cost, allowing their application to a variety of devices like mobile phones and game controllers, while certain projects have developed frameworks to facilitate and simplify their use in multi-‐purpose applications made by a wider range of people, involved in designing and programming of interactive systems.
3.1.1 Sensing forces
Piezoelectric sensors are a category of sensors that use the piezoelectric effect to measure pressure, acceleration, stain or force by converting them to an electrical charge. Piezoelectricity is the ability of some materials, notably crystals and certain ceramics, to generate an electric potential in response to physical stress.
Force-Sensing Resistors are materials whose resistance changes when force is applied on them. Flexible force sensors are ultra-‐thin, flexible printed circuits, consisting of two laminated layers of conductive material and pressure-‐sensitive ink. The resistance of a flexible sensor in a circuit is decreased under pressure. Flexible sensors are used to measure forces in a higher range than that of a piezoelectric sensor.
Capacitance sensors are very sensitive sensors, detecting anything that is conductive or has a dielectric different than that of the air. Nowadays they are
23
usually found in touch screens, though there are capacitance sensors that can detect body’s charge from distances up to a meter (such sensors are used by the Theremin musical instrument).
Accelerometer is a sensor that measures the change in speed of movement, or acceleration. Conceptually, an accelerometer behaves as a damped mass on a spring. When the accelerometer experiences acceleration, the mass is displaced to the point that the spring is able to accelerate the mass at the same rate as the casing. The displacement is then measured to give the acceleration. An accelerometer thus measures weight per unit of (test) mass, a quantity also known as specific force, or g-‐force. Another way of stating this is that by measuring weight, an accelerometer measures the acceleration of the free-‐fall reference frame relative to itself. Accelerometers typically have two or sometimes three axis of measurement.
Gyroscopes are sensors that measure angular acceleration. They are similar to accelerometers, except that they measure how fast the angle of rotation is changing, rather than measuring acceleration in a straight line. Gyroscopes work based on the principles of conservation of angular momentum. Mechanical gyroscopes are consist of a hi rate spinning disk whose axle is free to take any orientation, mounted on a set of two gimbals with orthogonal pivot axes, allowing the gyroscope to minimize any external torque and preserve its orientation, regardless of any motion of the platform on which is mounted.
3.1.2 Detecting motion
Photoelectric switches use a light beam hitting a photosensitive target sensor. When a body breaks the beam, passing between the sensor and the light beam, the switch is activated.
Passive infrared sensors measure infrared light radiating from objects in their field of view. Apparent motion is detected when an infrared source with one temperature, such as a human, passes in front of an infrared source with another temperature, such as a wall.
Magnetic switches consist of a very thin pair of contacts in a protective housing. When exposed to a magnet they are drawn together closing the switch.
Hall effect sensors are transducers that change their output voltage from low to high when the magnetic field around them changes.
3.1.3 Measuring distance
Most distance sensors use an energy source, transmitting a reference signal, and a sensor measuring the signal reflected by the target, back to the source, to calculate the distance of the target. Most applications use (near) infrared light sensors, sending an infrared beam and read the reflection of the beam off a target. For longer ranges, ultrasonic sensors are used, sending a ping of ultrasonic sound and then timing how long it takes to bounce back. Alternative implementations of distance sensors are based on combination of magnetic or
24
Hall effect sensors (for very short distances), measuring variations in a reference magnetic field.
3.2 Motion Capture and tracking systems
Motion capture (mocap)/tracking is the process of recording/tracking body movement and map it on to the movement of a digital model. Human body movement mechanics is a topic of interest for science since ancient years and today many different disciplines use motion analysis systems to capture movement and posture of the human body. In clinical research, motion capture has been used to analyze walking patterns of impaired patients in order to receive the right orthopedic treatment, to monitor the progress of a treatment, and to help the designing of prosthetics. Motion analysis is also widely used in sports to analyze and optimize athletes’ movement in order to achieve better performance.
During the last years motion capture systems has been used extensively in the areas of cinematography and video games in order to animate computer generated characters with natural human movement, following recorded moves of an actor inside special studios, replacing the tradition animating method of rotoscope on which animators trace over live action film movement, frame by frame. Despite the high cost of the special equipment, space and setup required for a motion capture system; they are preferred by some productions over traditional animation techniques for their ability to give more realistic results and in shorter, or even in real time.
Motion capture systems is a very active field of research, today there are many alternative type of systems using different technologies with differences in accuracy, functional requirements and cost, and their suitability depends on the nature of the project. The range of applications utilizing motion capture is becoming wider, following the progress made on processors, memory chips and sensors regarding their speed, accuracy, size and cost, as well as the progress on algorithms developed for data processing. The two major categories of motion capture systems are optical and non-optical.
3.2.1 Optical Systems
Optical systems work based on data captured from a single or multiple image sensors calibrated to provide overlapping projections, and algorithms to triangulate the 3D position of a subject in space. Most optical systems utilize markers, distinguishable by the cameras from the rest of the captured image in order to determine their position easier and more accurate. The process of motion capture begins with the calibration of the system in which markers are placed in known positions and every camera position and lens distortion is calculated accordingly. If two calibrated cameras see a marker, its 3D position can be determined. After calibration of the system a performer wears markers near each joint of her body to identify the motion by the positions or angles
25
between the markers. The number of cameras required for an optical system depends on the size of the space we need to cover, the desired accuracy and the number of subjects we need to track at the same time. Typically a system like that consists of 6 to 24 hi-‐speed cameras, while there are systems using hundreds of cameras to achieve better accuracy. Optical systems are characterized by the captured image resolution in pixels, the sampling frequency in hertz and the frame rate, which is balanced between the image resolution and sampling frequency. Different types of markers exist between optical systems.
Passive markers are the simplest type of markers, featuring retro-‐ reflective material to reflect light generated near the cameras lens. Camera’s threshold is adjusted to sample bright reflective markers ignoring the rest of the captured image. Major advantage of passive markers is that the subject does not need to wear any electronics that might limit her freedom to move. Passive markers are attached directly to the skin or attached to specially designed spandex/lycra full body suit. The major disadvantage of passive markers is what is called markers swapping, meaning that all markers are identical and the system might mismatch a marker with the corresponding joint, requiring larger number of cameras to avoid the problem.
Figure 1: Active marker motion capture system
Active markers are another type of markers. Instead of reflecting light, active markers use LEDs to emit light [Figure 1.], increasing the maximum distances and volume for capture. Optical systems using active markers triangulate positions by illuminating one LED at a time very quickly or multiple LEDs with software to identify them by their relative position. Refined versions of active markers exist, using time modulation over the amplitude or pulse of the LEDs to provide marker ID in order to eliminate markers swapping. Computer processing of modulated IDs offers clearer data and less filtered results. This higher accuracy and resolution requires more processing than passive technologies, but the additional processing is done at the camera to improve
26
resolution via a subpixel or centroid processing, providing both high resolution and high speed.
Both technologies mentioned above are mainly used indoors in special motion capture studios. Passive systems are usually less expensive than active and easier to set up, while active systems are more accurate and after the initial set up, require less time to get results from. Commercial active and passive systems are available from companies like Vicon, Naturalpoint, Qualisys and PhaseSpace, and usually cost between tens and hundreds of thousands of euro.
Semi-passive - Photosensitive markers. Prakash [44] is a motion capture system developed in MIT’s Media Lab as an inexpensive alternative system (the overall cost is less than 1.000 euro), suitable also for outdoor use and real time motion capture. Instead of using expensive hi-‐speed cameras, Prakash uses multi-‐LED hi speed projectors with passive binary films (masks) set in front. The light intensity sequencing provides a temporal modulation and the masks provide a spatial modulation. Each beamer projects invisible (near infrared) binary patterns thousands of times per second. Tags with photo sensor attached to the scene determine their location by decoding the transmitted space-‐dependent labels. Apart from their position, tags can compute their own orientation, incident illumination, and reflectance. These tracking tags work in natural lighting conditions and can be imperceptibly embedded in attire or other objects. The system supports an unlimited number of tags in a scene, with each tag uniquely identified to eliminate marker-‐swapping issues. Since the system eliminates a high-‐speed camera and the corresponding high-‐speed image stream, it requires significantly lower data bandwidth. The tags also provide incident illumination data, which can be used to match scene lighting when inserting synthetic elements.
Markerless Motion Capture. Motion capture and computer vision have been a very active field of research during the 15 last years and there have been a lot of studies to develop markerless motion capture systems, based on the use of a single or multiple cameras and optimized image analysis algorithms, with comparable performance to that of more expensive commercial systems, previously mentioned.
Recently a team from the Carnegie Mellon University working with Disney Research presented a system featuring small body-‐mounted cameras to reconstruct the motion of a subject [45]. Outward-‐looking cameras are attached to the limbs of the subject, and the joint angles and root pose are estimated through non-‐linear optimization. The optimization objective function incorporates terms for image matching error and temporal continuity of motion. Structure-‐from-‐motion is used to estimate the skeleton structure and to provide initialization for the non-‐linear optimization procedure. Global motion is estimated and drift is controlled by matching the captured set of videos to a 3D reconstruction of the scene built from reference imagery. By estimating the camera poses, the global and relative motion of an actor can be captured outdoors under a wide variety of lighting conditions or in extended indoor regions without any additional equipment.
27
Several other techniques and algorithms have been proposed for markerless motion capture for single or multiple subjects. Most of them use footage from multiple cameras to make a volumetric reconstruction of the body using background removal, skin color detection, “shape from silhouette (SFS)” and structure from motion methods. The formalism of SFS was introduced by A. Laurentini [46]. By definition, an object lies inside the volume generated by back-‐projecting its silhouette through the camera center (called silhouette’s cone). With multiple views of the same object at the same time, the intersection of all the silhouette’s cones build a volume called ”Visual Hull”, which is guaranteed to contain the real object. After the visual hull has been constructed, body pose is estimated by fitting shape models of specific body parts to the volume or by applying heuristic assumptions of features related to position and establish the correspondence of joints between successive frames. Markerless motion capture systems based on these methods have been developed by various academic research laboratories, like the BioMotion Lab of Stanford University [47], the University of Utrecht [48], the Max Planck Institute [49], and commercial systems like Organic Motion’s solutions.
3.2.2 Non-‐optical systems
This category includes all motion capture systems that instead of image sensors, they are using alternative types of sensors to capture motion. These systems collect data from wearable sensors attached to the subject’s body and translate them into motion in space. Their main advantage is that because they are not based on cameras, they don’t require a studio setup, they are more portable and they can be used outdoors, capturing motion in large areas and independent of light conditions. Their main disadvantages are that usually they are less accurate than optical systems and that they might limit the subject’s freedom to move and perform.
Inertial systems use miniature inertial sensors attached to the joints of the body, biomechanical models and sensor fusion algorithms to translate data into motion. Starting from a known position, inertial systems use wireless accelerometers and gyroscopes, sending data to a computer to continuously calculate the position, orientation and velocity of the subject with full six degrees of freedom body motion. Their accuracy depends on the number of sensors used. Commercial inertial motion capture systems are available from companies like XSens and Animazoo.
Mechanical or exo-skeleton systems use a skeletal-‐like structure worn by the subject, consisting either by straight metal or plastic rods, linked together with potentiometers articulating the joints, or using flexible sensors to measure joint angles during motion. Mechanical systems are real time and low cost but they capture only the relative movement of the subject, requiring an external absolute positioning system and they might be not comfortable for a performer to wear. Commercial systems like the Gypsy 7 by Animazoo combine gyroscope and exo-‐skeletal to capture absolute and relative motion.
Magnetic systems utilize sensors placed on the body to measure the low-‐frequency magnetic field generated by a transmitter source. Position and
28
orientation are calculated by the relative magnetic flux of three orthogonal coils on both the transmitter and each receiver. The relative intensity of the voltage or current of the three coils allows these systems to calculate both range and orientation by meticulously mapping the tracking volume. The sensor captures 6 degrees of freedom, which provides useful results obtained with two-‐thirds the number of markers required in optical systems; one on upper arm and one on lower arm for elbow position and angle. Magnetic systems are low cost but nowadays rarely used because of their major disadvantages. Since each sensor requires its own (fairly thick) shielded cable, the tether used by magnetic systems can be quite cumbersome. Magnetic systems have issues with azimuth. If an actor is doing a push-‐up type posture, the system will get confused. Multiple actor magnetic setups also have problems with two or more actors in close proximity. Sensors from the different actors will interfere with each other, providing distorted results. Magnetic systems have very negative reactions to metal or magnetic fields in the environment caused by metallic construction materials in buildings or other electrical appliances in use.
3.2.3 Motion capture libraries
As mentioned before, motion capture is an easier technique to give realistic motion to virtual characters and although most motion capture systems require expensive equipment and special studios, independent developers can take advantage of online available free or commercial libraries, which include motion captured data from various human activities, in file formats that can be imported in 3D animating software and mapped to any character model. A quick search for motion capture libraries will return a long list of resources, among them the Carnegie Mellon University, which has published a very large motion capture database, freely available at http://mocap.cs.cmu.edu/, http://www.mocapclub.com/, which includes a library from the Motion Capture Society association, and http://mocapdata.com, which is also a large resource of both free and commercial animation files.
3.3 Motion sense in interaction
During the last years, sensors and principals used in motion capture systems have been used in smaller scale, to low cost consumer computer input devices, to provide physical interaction input interfaces. During the last five years, all major companies in the video game industry have developed different technologies for games and controllers with motion based interaction. Although sports have always been a popular theme on video games, and game companies started to explore sensor based physical interfaces from the middle of 1980s, it was not until recently that technology allowed them to produce wireless and lightweight devices, practical to use as game controllers. That fact, along with the popularity of large TV screens in today’s average living room, have created the basis for the creation of games offering more immersion and encourage gamers’ physical activity. Today “exertion games” or “exergames” is a growing market, attracting also more people that were not traditionally attracted to video games, and considered them a rather passive activity.
29
This part is a presentation of current techniques and examples of devices for physical input interfaces and game controllers, based on motion sensors.
Hand Tracking
Designing wearable input interfaces, usually called “data gloves”, to allow a user to use her hands and fingers to navigate in a virtual world, use hand gestures, and interact with objects in a more natural way, was one of the first examples of natural user interfaces. The first data glove was created in 1977, and since then a few companies and laboratories came up with their own implementations. Data gloves use various sensors as accelerometers or gyroscopes to capture hand movement and flexible sensors for the bending of fingers. Some data gloves use optical fibers attached to the fingers and a photocell as a way to measure bending, since some light escapes the fiber when bended. Some data gloves also provide haptic feedback, applying small forces and vibrations to give users a sense of touch.
Data gloves are also used on body motion capture systems, because solutions based on markers are not able to capture such detail in finger movement. This technique is called hand-‐over.
Head/Face Tracking
Facial expressions and small facial muscles movement is also difficult to capture during body motion capture. For that reason facial motion capture is done in a separate recording, by attaching a lot of small markers in the actors face.
In the field of interaction and the gaming industry, head tracking devices exist, allowing the computer to set a camera’s viewpoint according to the position of the player in space. Commercial systems, like NaturalPoint’s TrackIR, use an infrared sensor and active markers attached to player’s head. Other systems, like a lot of head mounted displays for virtual reality systems, use tilt sensors to track head movement. There are also available applications that use a plain camera and automatic face detection algorithms to track user’s position, but because of using a plain camera they are less accurate on movement along the depth axis.
Eye Tracking
Eye tracking is the process of measuring either the point of gaze of a viewer or the motion of an eye relative to head. Eye trackers are mostly used in research on the visual system, in psychology, in cognitive linguistics and also in marketing research, product design and usability testing, to spot elements that attract viewers gaze or others that do not.
Eye trackers measure rotations of the eye and principally the fall into three categories: The first category uses an attachment to the eye, like a contact lens with an embedded mirror or magnetic field sensor. Measurements with
30
tight fitting contact lenses have provided extremely sensitive recordings of eye movement, and magnetic search coils are the method of choice for researchers studying the dynamics and underlying physiology of eye movement. The second category uses electric potentials measured with electrodes placed around the eyes. The eyes are the origin of a steady electric potential field, which can also be detected in total darkness and if the eyes are closed. It can be modeled to be generated by a dipole with its positive pole at the cornea and its negative pole at the retina. The electric signal that can be derived using two pairs of contact electrodes placed on the skin around one eye is called Electroculogram (EOG). If the eyes move from the centre position towards the periphery, the retina approaches one electrode while the cornea approaches the opposing one. This change in the orientation of the dipole and consequently the electric potential field results in a change in the measured EOG signal. Inversely, by analyzing these changes eye movement can be tracked.
The last and most commonly used category is non-‐intrusive, optical based systems using the Pupil Centre Corneal Reflection (PCCR) technique. This technique uses a light source to illuminate the eye causing highly visible reflections, and a camera to capture an image of the eye showing these reflections. Image processing algorithms are then used to identify the reflection of the light source on the cornea and the pupil. Calculating the angle between the two reflections, combined with other geometrical characteristics of the reflections, allow us to determine the gaze direction.
There are two different illumination setups that can be used with PCCR technique: bright pupil tracking, where an illuminator is placed close to the optical axis of the imaging device, which causes the pupil to appear lit up; and dark pupil, where the illuminator is placed away from the optical axis causing the pupil to appear darker than the iris. There are different factors affecting the pupil detection when using each one of the two techniques like age of the subject, light conditions and ethnicity. Some commercial systems like Tobii eye trackers can use both techniques, determining the best technique during the calibration procedure where the viewer is asked to gaze at certain points on screen.
Eye trackers can also be used as an interaction input interface, replacing a mouse for example, allowing the user to control the cursor with her eyes. EyeWriter [50] is a collaborative research project for building an eye tracker from inexpensive material, along with open source software, developed to empower people who are suffering from ALS and other physical disabilities with creative technologies
Nintendo Wii Remote
In 2006, Nintendo released its, now popular, Wii video game console. The major innovation of Wii, was its remote game controller, the Wii Mote. Wii Mote features an infrared sensor and an accelerometer, that allows it to calculate its position in space and track hand movement. Using the wii mote, the player is able to aim at items on screen, and interact using gestures and natural movement.
31
Upon its release date, the Wii mote gained much attention thanks to its advanced features and quickly became very popular among programming enthusiasts, who wrote software that allowed the use of the device beyond the game console. After that the wii mote has been used numerous projects as a controller, or as an infrared sensor to track infrared LEDs, attached to other items, for example a head tracking system like the one previously mentioned.
Blobo
The blobo sensor is manufactured by a small company in Finland, targeting on games designed for the sensor on PC and Mac platforms. Blobos have the shape of a small ball, which packs an accelerometer and gyroscope sensor as well as an air pressure sensor, which can measure how hard the player holds the ball. Its unique ball shape makes it ideal to be used in children games and the ability to measure pressure and speed, could act as another indicator on the players emotional state.
Floor boards
Floorboards equipped with pressure sensors were the first attempt to make an input interface, with which a player would utilize her whole body in game interaction. The first controller of this kind was created by Atari, in 1982, called Joyboard. In 2007, Nintendo released a modern, wireless version, called Balance Board, along with a series of fitness games utilizing it, called Wii Fit, for the Wii game console.
Sony PlayStation Move
Sony’s motion sensing platform for the PlayStation console includes the PlayStation Eye camera, which is capable of capturing standard video at 60 Hz, at 640x480 pixel resolution, or at 120 Hz at 320x240 pixels, along with computer vision and gesture recognition software, and a microphone array for voice location tracking and voice commands recognition.
The PlayStation Move motion controller features an orb at the head, which can glow in any of a full range of RGB colors using LEDs. Based on the colors in the user environment captured by the PlayStation Eye camera, the system dynamically selects an orb color that can be distinguished from the rest of the scene. The colored light serves as an active marker, the position of which can be tracked by the camera. The uniform spherical shape and known size of the light, also allows the system to accurately determine the controller's distance from the camera through the light's size in the image. The controller also features an accelerometer and a gyroscope, used to track rotation as well as overall motion. An internal magnetometer is also used for calibrating the controller's orientation against the earth's magnetic field to help correct against cumulative error (drift) by the inertial sensors. The inertial sensors can be used to calculate position in cases where the camera tracking is insufficient, such as when the controller is obscured behind the player's back.
32
Microsoft Kinect
Kinect was Microsoft’s answer on motion sensors to the video game consoles competition. Initially released as an accessory for the Xbox 360 game console, Kinect was the first consumer device that allowed real-‐time, markerless full body 3D motion capture in a room environment. Kinect features a normal RGB camera and a depth sensor, consisting of an infrared laser projector and an infrared camera, capable of capturing 3D video data at 30 Hz, at 640x480 pixels. The sensor also includes a 3-‐axis accelerometer to determine its orientation and a four-‐microphone array allowing it to also receive voice commands, ambient noise reduction, and to determine the source location of a sound. The most innovative part of the Kinect though, is a microprocessor running a “trained”, by using machine learning and a large training set of images, algorithm, that allows it to track multiple bodies’ motion, based on 20 joints for each body [Figure 2].
Kinect uses a single depth image [51], which is segmented into a dense
probabilistic body part labeling, with the parts defined to be spatially localized near skeletal joint of interest. Reprojecting the inferred parts into world space, spatial modes of each part distribution are localized and thus generate confidence-‐weighted proposals for the 3D locations of each skeletal joint. The segmentation into body parts is treated as a per-‐pixel classification task. A very large collection of realistic depth images of humans of many shapes and sizes in highly varied poses sampled from a large motion capture database were used to train a deep randomized decision forest classifier which avoids over-‐fitting. Simple, discriminative depth comparison image features yield 3D translation invariance while maintaining high computational efficiency. Finally, spatial modes of the inferred per-‐pixel distributions are computed using mean shift, resulting in the 3D joint proposals.
Figure 2 Kinect tracking joints
33
Kinect’s real time full body motion capture capabilities allows the creation of games and other applications featuring full body and motion, physical interaction, based on position determination, collision detection of virtual objects with independent body parts, motion gestures and postures recognition. Kinect truly revolutionized the field of natural user interface for gaming and became upon its release, the fastest selling electric device ever, while as with the release of WiiMote, it quickly attracted the attention of a large community of programming enthusiasts who wrote open source software, allowing the use of Kinect for independent, computer platform applications, followed by a large number of projects found on internet including interactive application, games, installations and robotics, utilizing the sensor. After the release on Internet of a large number of impressive examples of uses of the Kinect, companies involved in its development, like Prime Sense and Microsoft, decided to support these efforts by releasing software to facilitate independent project development.
Panasonic D-‐Imager
D-‐Imager was introduced to the market by Panasonic in 2011, targeting to commercial businesses rather than game consoles’ end customers. The D-‐Imager works in a similar way to Kinect, using an array of near infrared LED emitters, instead of a single laser beam, and measuring the time delay between the LED emitters and the target reflected light is measured on a pixel-‐by-‐pixel basis (time of flight principle). Using internal processing similar to the Kinect’s, the sensor can track 20 joints of a human body in front of it. The comparative advantages of D-‐Imager over the Kinect are the increased tracking range (1,2 to 9 meters over 1,2 to 3,5 meters), and the ability to fully track up to 5 people simultaneously over 2 that Kinect supports. D-‐Imager lacks the VGA camera and microphone array found in the Kinect though. D-‐Imager comes with Omek Beckon Development Suite, supporting multiple development environments, packs with ready to use gestures and an authoring tool that allows developers to easily record custom gestures and feed them automatically to a machine learning algorithm. These advantages of course come at a much higher cost that the Kinect sensor.
3.4 Comparison of motion capture systems for the EPLT installation
Commercial marker based, optical motion capture systems have the highest sampling rate performance and have been proved to be very robust through the years that have been available. It should be noted though that these systems have been designed for high detail motion capture for animating and movies, which have higher performance requirements than game interaction. The disadvantages of marker based systems are: i) the use of markers, that along with the body sensors that will have to be placed on players body and calibrated will require a lot of time in order to prepare a player for the game, ii) the very high cost of such systems compared to game consoles’ controllers, and iii)
34
marker based systems require a number of cameras to be set up and calibrated above stage [Figure 3.], requiring a more permanent setup, while solutions like the Kinect and D-‐Imager don’t require special calibration and are more flexible to be used from a permanent installation to a smaller scale classroom setup.
Figure 3 Optical tracking setup using 12 cameras
The advantage of depth sensors like the Kinect and the D-‐Imager is that the user does not have to wear anything in order to be tracked. Trackers on the other hand can be also used to track objects on stage, besides bodies. If there is need to track a small number of objects, inertial sensors (accelerometer, gyroscope) can be attached to the object and track motion based on an initial position on stage. An alternative solution is to use the Wii Remote’s infrared camera, though using it the other way around, where the camera is setup on a position, and infrared LED are attached to objects, might interfere with the body capture depth sensors. The other disadvantage of the Kinect specially is its limited range of less than 4 meters. Additionally because it is based on the viewing angle of a single camera, the active area of tracking form a triangle which gets narrower approaching the position of the camera, leaving part of the area in front of the sensor out of sight. Multiple Kinect sensors can be used to cover a larger area, but they have to be very carefully positioned so that they don’t interfere with each other, and cause additional complexity to the development of the application. The 9 meters range of D-‐Imager makes it more suitable for larger stages.
Markerless non optical motion capture systems present the same difficulty with marker based, that of the requirement of a large number of sensors attached and calibrated to the players body. Finally markerless optical systems seem appealing, the only commercial system found was that of Organic Motions, which was not tested during the research, but as a general note, experience has shown that markerless optical systems are not so accurate as the rest of the categories and their performance might strongly depend on lighting conditions.
35
Chapter 4: Sensing emotions
The vision of machines with emotional intelligence [52] coexists with that of artificial intelligence since the invention of the term. It is a popular theme in science fiction literature, featuring androids understanding emotions and having human like behavior, and aptly raising ethical questions about the use of such technologies. Although we are still quite far from this vision (or nightmare for some), research laboratories around the world work on developing emotion-‐sensing technology to support the study of human behavior, the affective human computer interaction, and communication between people. Automatic recognition of human affective states is an important research topic for a broad range of applications, including psychology research, computer assisted therapeutic systems, safety monitoring applications, assessment and training systems, user experience studies, marketing research, and automatic affect-‐based indexing of digital material [53].
Emotion recognition can make social interaction more effective in cases where there are difficulties to communicate expressively, for example for people on the autistic spectrum, where an autistic person might outwardly appear calm and relaxed, while experiencing a state of emotional or cognitive overload [54], and every day social networking applications where there is a tendency on text based communication, or communicating through avatars in virtual worlds.
As with physical interaction interfaces, a lot of studies experiment with the application of physiological sensors on video games and interactive story telling [55]. Video games are an excellent application area to explore benefits and drawbacks of physiological sensor-‐interaction because there are less severe consequences of failure than in critical control systems, making games a field bridging laboratory research and commercial systems. It has also been shown that video games can stimulate strong emotional reactions from players, making them an appropriate field for behavior studies, and as gaming has turned into a huge entertainment industry, companies are interested to use physiological feedback for game design evaluation. Explorations to develop “biofeedback” games, games to make users more aware of their physiological state and train them to control it using game dynamics, started from the early 1980s. In 1984, Thought Technology developed a racing game called CalmPrix [56], utilizing a modified galvanic skin response sensor, followed by other innovative game companies like Atari and Nintendo using a variety of body sensors, presenting their own biofeedback games. Some of these games never made it to the market while others did, but without the expected market success.
As we all know from personal experience, emotions are hard to define and recognize. Despite all our senses, the verbal and non-‐verbal communications skills we have as humans, it is often hard to immediately recognize someone’s emotions, if they are real or pretended, if someone is talking seriously or joking, laughing or crying etc. Expression of emotions is becoming even more complex when analyzed in a global, cross-‐cultural scale. It is easy to imagine thus, that
36
emotion recognition is a very difficult task for a computer, especially on real time application where the system has to analyze the user’s state and give a response on a very narrow time frame. Classic psychological research claims the existence of six basic expressions of emotion that are universally displayed and recognized: happiness, anger, sadness, surprise, disgust, and fear [57], other studies on emotion recognition also include emotions like despair, interest, irritation and pride [58]. A lot of studies do not accept this categorization of emotions, suggesting that it is not emotions but some components of emotions that are universally linked with certain communicative displays. Most theorists agree that the two dominant dimensions of emotion can be described as valence (pleasant vs. unpleasant) and arousal (activated vs. deactivated or excited vs. calm) [54]. Mapping even basic emotions on these two dimensions is challenging [Figure 4.], and emotion recognition systems analyzing single human modalities like voice or facial expressions, usually suffer either from poor accuracy or over simplified classification of emotions.
Figure 4 Emotions mapped on basic dimensions
The next part is a presentation of the various sensors used to capture physiological signals that can be associated with the emotional state of a person, along with software for emotion recognition developed from previous research.
37
4.1 Speech analysis
Speech is the primary method of human communication. Analysis of certain features extracted by speech characteristics like intensity, pitch, phonetic features, voice segments, pause length, and spectral modeling, along with linguistic analysis based on keywords used, can be used to make conclusions over the emotional state of a person [60] .
EmoVoice [61], developed by the university of Augsburg, Human Centered Multimedia Laboratory, is a framework for emotional speech corpus and classifier creation and for offline as well as real-‐time on-‐ line speech emotion recognition. The framework is meant to be used by non-‐experts and therefore comes with an interface to create an own personal or application specific emotion recognizer. EmoVoice is now integrated to the SSI framework (see emotion frameworks)
openEar [62], developed by the Technische Universität München, Institute for Human-‐Machine Communication, is an open source, C++ library, form speech processing and emotion recognition, combining features for audio recording, feature extraction, and classification of results, along with pre-‐trained models.
4.2 Facial expressions
Facial expressions analysis has been the first, and extensively used since then, method for emotion recognition on multiple studies, and it is the preferred method for single modal emotion recognition systems. Facial expressions are the main non-‐verbal communication tools, providing the most powerful, versatile and natural means of communicating motivational and affective state. Apart from expressing emotion, facial expressions are providing important communicative cues during social interaction, such as our level of interest, our desire to take a speaking turn and continuous feedback signaling understanding of the information conveyed. Facial expression constitutes 55 percent of the effect of a communicated message [63] and is hence a major modality in human communication. Several studies have also shown that ordinary people can detect six emotional facial expressions with an accuracy ranging from 70% to 98%.
In facial expressions analysis systems, the face is segmented focusing on the facial areas of eyes, eyebrows, mouth and nose. Each of these feature-‐candidate areas contains the features whose boundaries are extracted and stored over time, and then the displacement of each feature is compared to “neutral face” model images to conclude the emotion expressed by the subject. Changes over systems are usually on the number of features tracked and the kind of classifier used.
There are already quite a few systems for facial expression analysis developed by research institutes and some are available for research, or commercially. Examples of such systems are: the SHORE system [64], developed by the Fraunhofer institute, eMotion [65], a project started from the University of Amsterdam, which also includes software to map captured facial expressions on
38
second life avatars, MindReader [66] developed initially by Cambridge University (based on the commercial system of Nevenvision, now acquired by Google), projects of the ibug (intelligent behaviour understanding group) of the Imperial College London [67] , and FaceAPI [68] from Seeing machines. There are also some open source examples of facial features tracking, using openCV [69](open Computer Vision) library and the included Haar classifier. openCV is a library for real time image analysis and it has become one of the standard libraries for computer vision, with C, C++, Python, and Java interfaces , used in robotics and multimedia applications, and included in a lot of frameworks for the development of such applications.
4.3 Body movement/postures
Although a lot has been written for the so-‐called “body language”, body movement and posture has not been researched on emotion recognition so extensively, as facial expressions and voice analysis. There are though some studies, questioning the validity of facial expressions as a modality for recognizing affective states, because face is involved in various functions and many of the famously recognized facial expressions represent only a small subset of the possible expressions, suggesting body posture as a very good indicator for certain categories of basic emotions. Most studies however, have not been able to demonstrate similar recognition accuracy with that of facial expressions classifiers, especially those who study emotion recognition from static body postures only. Coulson [70] considered how 6 joint rotations (head bend, chest bend, abdomen twist, shoulder forward/backward, shoulder swing, and elbow bend) could help recognizing 6 emotions (angry, fear, happy, sad, surprised and disgust). Concordance rates for attributions of the 6 emotions ranged from zero for many disgust postures to over 90 percent for some anger and sadness postures. Kleinsmith A. and Bianchi-‐Berthouze [71] used four affective dimensions (valence, arousal, potency, and avoidance) instead of discrete emotion categories. On their study there was a 12% error percentage for valence, 10% for both arousal and potency, and 11% in the case of avoidance. In their conclusions they report that other types of body motion features, may be necessary for achieving better recognition of some affective states such as fear, and better performance of their model. Other studies that include body motion as a modality [72], tracking features like quantity of motion and contraction index of the body, velocity, acceleration and fluidity of the hand’s barycenter, orientation and approach/avoidance behaviors of two participants towards their interlocutor in an interaction, suggest that body language reflect their level of activation and dominance but are less informative about their valence (positive vs negative).
Another role of body posture should be also noted. Studies suggest that body posture can actually induce changes in affective states or have a feedback role affecting motivation and emotion. A study by Riskind and Gotay [73], for example, revealed how “subjects who had been temporarily placed in a slumped, depressed physical posture later appeared to develop helplessness more readily, as assessed by their lack of persistence in a standard learned helplessness task, than did subjects who had been placed in an expansive, upright posture.”
39
Furthermore, it was shown that posture had also an effect on verbally reported self-‐perceptions. Another study [74] examining postures as a modality for recognizing emotions, suggests that involving the body in the control of technology facilitates users’ expression of their feelings, which in turn makes them have an improved experience, i.e., being engaged.
An open source library for analyzing body motion extracted from video is the EyesWeb [75] Expressive Gesture Analysis Library. EyesWeb refers both to research projects of InfoMus Lab of the University of Genova, on multimodal interactive systems and expressive gesture, and to an open software platform to support the development of real-‐time multimodal distributed interactive applications.
4.4 Pupil size
Studies have shown that the eye’s pupil is significantly larger during both emotionally negative and positive stimuli than during neutral stimuli [76]. Although we cannot distinguish valence, pupil size can be used as an additional modality of arousal. A lot of eye tracker devices have the ability to measure the pupil’s size.
4.5 Bio-‐feedback sensors
Emotion recognition systems based on external modalities like speech, facial expressions, and body posture are more familiar to use, because they accept the same input we as humans do, in our everyday interactions with others. The performance of such systems however, depends on environmental conditions, and the training models that have been used on their machine learning algorithms. Although advanced processing algorithms have been developed to minimize the effect of environmental conditions like illumination for facial expression analysis, or noise cancellation for speech analysis, training classifiers can be practically difficult and very time-‐consuming procedure. A speech analysis system for example has to be trained for every different language, and as vocal characteristics change depending on age, a system trained on adult acoustic models would not be affective with children. From the range of modalities mentioned in the previous section, facial expression analysis has been researched the most and proven to be the most accurate. The use of this technique though, also introduces some practical constrains, as the camera has to have a clear image of the subject’s face and sufficient illumination to be accurate. Additionally it is easy from someone not to reveal his emotions on the camera, or as mentioned earlier, autistic persons for example might even have difficulty to do so when they want to express their emotions. For these reasons scientists have also turned to the use of embodied biophysical sensors, monitoring signals that can reveal valuable information, not only for the physical state of someone, but for the emotional and mental state as well.
The physiological signals usually monitored in behavior studies are:
40
Heartbeat rate (ECG): Electrocardiography sensors determine heartbeat rate by detecting and amplifying the tiny electrical changes on the skin that are caused when the heart muscle depolarizes, by measuring the difference in voltage between two electrodes placed either side of the heart. There are also optical heartbeat sensors using an infrared LED and a phototransistor, placed closed to each other with usually a fingertip, or the ear lobe, in between. These sensors work based on the fact that when your heart beats you have a quick rush of blood into tiny blood vessels close to your skin, which makes it less transparent, so less light comes through it to the phototransistor. Changes in heartbeat can give us a clear index of arousal, but sensors are prone to movement artifacts. Increase of hear rate has been related to fear, and decrease to anger [38]
Galvanic Skin Response (GSR)/Electro Dermal Activity (EDA) both refer to the electrical changes measured at the surface of the skin. EDA sensors usually work by passing a miniscule amount of direct current between two electrodes in contact with the skin. When a person experiences emotional arousal, increased cognitive workload or physical exertion, the brain sends signals to the skin to increase the level of sweating. Sweat is a weak electrolyte and good conductor, the filling of sweat ducts results in increasing the conductance of the applied current. Changes in skin conductance at the surface thus provide a sensitive and convenient measure of assessing sympathetic arousal changes associated with emotion, cognition and attention.
Skin temperature/Heat flux is the amount of heat that the body emits. Studies have shown that Heat Flux is effective in detecting context switches. This is because context switches often involve physical movement, which causes the body to warm up and therefore emit heat. Heat flux has been reported to increase during increased cognitive load [77].
There are a lot of companies today producing commercial wireless, wearable biophysical sensors transmitting signals to software running on a smart-‐phone or computer, for sports enthusiasts who like to monitor and keep track of their exercising habits. Most of them do not offer an open API for application development but in some cases it is possible to read the packets sent by the sensor with custom libraries.
4.6 Brain Computer Interfaces (BCI)
Brain computer interfaces are sensors monitoring brain activity to translate user’s thoughts or mental state into actions on the computer. The brain’s electrical charge is maintained by billions of neurons. Neurons are electrically charged by membrane transport proteins that pump ions across their membranes. Neurons are constantly exchanging ions with the extracellular milieu, for example to propagate action potentials.
Electroencephalography (EEG) is the recording of electrical activity, using electrodes attached along the scalp, measuring voltage fluctuations
41
resulting from ionic current flows within neurons, and generated by the synchronous activity of thousands or millions of neurons with similar spatial orientation in the brain.
Since its discovery in 1924 by Hans Berger, EEG has been widely used in clinical research, on neurology, to diagnose epilepsy, coma, brain death and various encephalopathies. Scalp EEG activity shows oscillations at a variety of frequencies and researchers have associated certain oscillations frequency ranges and spatial distributions to different states of brain functioning. Although EEG is not the most accurate method to monitor brain activity, its ease of use, portability and low set-‐up cost has made it the most studied one, and resulted its application to other research fields and all kinds of experiments where it is interesting to monitor the mental state of the subject. Usually three frequency ranges are used for this purpose:
• Theta (4 -‐ 7 Hz): related to drowsiness
• Alpha (8 -‐ 13 Hz): related to relaxation
• Beta (>13 – 30 Hz): related to alertness
During the last years, EEG has made its way to human computer interaction research, research towards machines with emotional intelligence, and a small number of companies are working on developing low cost, non-‐invasive, brain computer interface products like the Emotiv headset , Neurosky’s Mindwave, Starlab’s Enobio (which combines EEG, ECG and EOG sensors) , and OpenEEG [78], a community project has been created to support the creation of open hardware and software solutions. On a consumer level, these interfaces are currently used mainly in gaming and other entertainment applications, since they are still proved to be inaccurate and not practical for more critical applications.
Functional near-infrared spectroscopy (fNIRS) is an emerging technique for sensing brain activity, similar to the technique used by optical heartbeat sensors mentioned earlier in the document. The fNIRS system is made up of probes that send light at two wavelengths in the near-‐infrared range. Biological tissues are relatively transparent to light at these wavelengths. The main absorbers of the light are oxygenated hemoglobin and deoxygenated hemoglobin. These act as relevant markers of hemodynamic and metabolic changes associated with neural activity in the brain. The reflected light is then picked up by the detectors on the device. Depending on the amount of light that is reflected, we can get a measure of brain activity in the area beneath the sensors.
42
Studies in fNIRS [79] report that the hemodynamic response being measured in brain is a slow response which occur over 5-‐8 seconds. This makes the technique currently impractical to be used for interaction input interfaces. For the moment there is still no commercial brain computer interface utilizing the fNIRS technique.
4.7 Developing Tools for Multimodal Biofeedback
As mentioned in the introduction of this chapter, emotion recognition is a difficult task for a computer and the performance of such systems can vary depending on the state of the interacting person as well as environmental conditions. In order to increase the reliability of emotion sensing systems, and after gaining experience by developing single modal analysis systems, modern research examines the application of multi-‐modal systems [77][80], combining various sensors and data analysis and sharing a last decision level to determine the emotional or effective state of the subject. Towards this direction there have been a number of projects, with contribution from universities all over Europe, for the development of frameworks and middleware that make easier for researchers to develop and use multi-‐modal emotion recognition systems.
CALLAS [81] (Conveying Affectiveness in Leading-‐edge Living Adaptive System) is a project funded by the European Commission, under the 6th Framework Programme, with the participation of a lot of universities around Europe. CALLAS is a framework based on a plug-‐in multimodal architecture, containing a collection of components for feature extraction from text, audio, video and motion sensors, and process emotional aspects in real-‐time for easy development of applications for art and entertainment. The CALLAS framework also includes its own visual programming, authoring tool, CAT.
SEMAINE [82] is also a project funded by the European Commission, under the 7th Framework Programme, aiming to build a Sensitive Artificial Listener, a multimodal dialogue system which can sustain an interaction with a user for some time and react appropriately to user’s non-‐verbal behavior. The system can take input from video and audio to analyze the user’s emotional state. The SEMAINE API is available as open source, supporting C++ and Java; it features the Apache ActiveMQ message broker as an integration layer and can run as a distributed system.
43
SSI [83] (Social Signal Interpretation) framework is developed by the Human Centered Multimedia research laboratory of the University of Augsburg. It is available as open source, written in C++, and contains tools to record, analyze and recognize human behavior in real-‐time, such as gestures, mimics, head nods, and emotional speech. It also follows a plug-‐in based design, with a growing collection including among others, input from the Wii-‐mote and the Kinect sensor (under development), while it also supports the use of external libraries such as OpenCV, ARToolKit, SHORE, Torch, Speex, Watson. SSI supports the machine-‐learning pipeline in its full length and offers a graphical interface that assists a user to collect own training corpora and obtain personalized models. It also features an XML-‐editor programming environment to draft and run pipelines without special programming skills.
4.8 Data representation of emotions
Apart from developing special software, a lot of projects have focused on creating standard formats to represent human emotions and share them along emotion aware applications. These formats can be used for example to annotate digital media in order to train models for affective indexing, collect data to train virtual agents, or to share data between emotion recognition system and an application, developed by another party, that will animate a virtual avatar of the user accordingly.
MPEG-‐4 (Part 2 “Visual”) contains MPEG-4 FAP [84](Facial animation parameters), a set of 68 parameters to allow the animation of synthetic face models, which can be used on facial expressions analysis applications. MPEG-V [85] is a standard under development for a common middle layer format for interaction and visualization, among virtual world applications.
EMMA [86](Extensible Multimodal Annotation Language), is an XML markup language, recommended by the W3C, for containing and annotating the interpretation of user input. It is a wrapper language that can include various kind of payloads representing interpretation of various user input. An interpretation element contains information about the modality upon which the interpretation is based, can indicate start and end timestamps of the interpretation and many more attributes. EmotionML [87], is a “plug-‐in”
44
language, also recommended by W3C, which can be combined with EMMA, to represent human emotions on user input. EmotionML recognizes the fact that there is no single agreed representation of affective states, or of vocabularies to use. Therefore, an emotional state <emotion> can be characterized using four types of descriptions: <category>, <dimensions>, <appraisals>, and <action-‐tendencies>. An example of EMMA document carrying EmotionML as interpretation payload is given below:
<emma:emma xmlns:emma="http://www.w3.org/2003/04/emma" version="1.0"> <emma:interpretation emma:start="123456789"> <emotion xmlns="http://www.w3.org/2005/Incubator/emotion"> <dimensions set="valenceArousalPotency"> <arousal value="-0.29"/> <valence value="-0.22"/> </dimensions> </emotion>
</emma:interpretation> </emma:emma>
HEO [88](Human Emotion Ontology) is an effort to make an RDF, OWL ontology to represent human emotions with sub classes and attributes to describe input modalities, dimensions (arousal, valence, dominance), action tendencies and many more.
SAIBA [89] (Situation, Agent, Intention, Behavior, Animation) is a running project focusing on the creation of a framework of languages for Embodied Conversational Agents, with three stages representing intent planning, behavior planning and behavior realization. A Function Markup Language (FML), describing intent without referring to physical behavior, mediates between the first two stages and a Behavior Markup Language (BML) describing desired physical realization, mediates between the last two stages. BML has behavior elements for head, torso, face, gaze, body, legs, gesture, speech and lips and defines attributes for animating, lips and gaze synchronization, gestures etc.
More information, articles and tools can be found on the HUMAINE Association website [90], an international community around research on emotions and human-‐machine interaction.
4.9 Biofeedback Interactions. Thoughts and insights
The biofeedback mechanisms used in a game are defined by the interactions designed for it, on the questions of what we want to measure, why we want to measure it, and how we use this measurement inside the game. However, designing biofeedback interactions for a game installation featuring also physical motion interaction, automatically sets some factors of the game setting that have to be considered. Rather than the statistical evaluation of individual emotion recognition techniques, this study focuses on the practical application of sensors in an interactive game space and the challenges presented by the setting. This part of the document discusses some thoughts and ideas derived from the study of the characteristics of sensors presented previously, and insights from the testing of particular technologies during research.
45
As discussed in section 2.4, bio-‐signals collected from sensors can be used in game interaction basically in two ways: i) as a continuous monitored signal, correlated to running variables of the game, such as the difficulty or pace; or monitoring player’s progress towards a desired state, perceived as goal; ii) as signals monitored in relatively small time frames, on specific points of a story-‐driven game, acting as sensing mechanisms of virtual agents. Certain signals like ECG, body temperature/thermal flux, and EEG, are offered more to be handled as continuous. Naturally, in a game with intensive physical motion interaction, values of heart rate, temperature, and skin conductance are expected to increase, which could make their use for emotion recognition purposes problematic.
EEG signals are also interesting to monitor through a game. EEG sensors can give an indication of the cognitive load of the player, making it interesting to study the correlation of physical and mental state during a game, seeking additional signs to support the idea of embodied learning. The main problem with EEG sensors is that the signals monitored are very weak and noisy, and the electrodes must be positioned very carefully on scalp.
Three commercial wireless EEG sensors where tested during research as possible solutions for the EPLT installation. The first one was the Emotiv EPOC headset. EPOC uses 14 gold plated electrodes that need to be moisturized and placed carefully on the user’s scalp. The sensor is able to monitor 4 mental states, 13 conscious thoughts, and facial expressions, and also includes 2 gyroscopes to track head movement. Although EPOC is an interesting piece of hardware and software, it was found not suitable for a public interactive space. Placing all moisturized electrodes in the right position can take a significant time and it is difficult for electrodes to maintain their position during a game with physical motion. Additionally the advanced function of recognizing conscious thoughts requires also a lot of time in both user and machine training. After these findings the research turned to simpler solutions and tested Neurosky’s sensors. Among raw EEG values of 6 frequency ranges, Neurosky’s sensors include two values indicating attention and meditation levels, derived from what the company calls eSense algorithm. The sensor amplifies the raw brainwave signal and removes the ambient noise and muscle movement. The eSense algorithm is then applied to the remaining signal, resulting in the interpreted eSense attention/mediation meter values. Previous research papers support that the sensor successfully indicates changes in user’s mental states [91][92]. Mindset was the first sensor tested, featuring a single dry electrode to capture EEG signals and a pair of headphones. During tests it was found that it was also difficult to get a perfect signal from the single electrode, required for the eSense values to work, requiring a lot of time and patience. Besides the electrode placed on the users forehead, the sensor has 3 more contacts placed in the left headphone that need to make good contact with the skin. Even when the sensor had a perfect signal it was difficult to maintain it while moving and jumping. The last sensor tested was Neurosky’s latest sensor called Mindwave [Figure 5]. Mindwave has an improved design over its predecessor, using a single electrode that is wider and more comfortable to wear and an earlobe clip that ensures good contact with the skin. Testing has shown that indeed the sensor gets a good signal easily and can maintain it even in relatively intense motion.
46
Figure 5 Neurosky mindwave single dry electrode EEG sensor
Pupil size was found not practical to use as a biofeedback mechanism. An immersive environment with continuous visual stimuli and physical motion of the player is expected to affect eye movement and pupil size, and the requirement of a wearable camera in close distance from player’s eye, will limit the sense of freedom to move.
As mentioned in section 4.5, accuracy of emotion recognition systems based on facial expressions, speech analysis and body postures can be limited by environmental conditions. Thinking specially of a game with motion interaction we would expect the user to move a lot inside space, being in a distance of some meters from the camera, and adopting postures suggested by the game action. These factors, along with fast transitions from one emotional state to another, experienced during intense moments for the progress of the game, make these modalities practical to monitor as continuous signals only for later statistical analysis, and not as signals correlated continuously with runtime properties of the game play.
During research tests on facial expression analysis were made using the SHORE SDK[64] provided by the Fraunhofer research organization. The SHORE engine is able to detect and analyze multiple faces in a frame, providing gender recognition, an age estimation and analyze facial expressions providing indication values of 4 basic emotions: angry, happy, sad and surprised. Tests have shown that SHORE’s performance is very high even under low illumination and when the face covers a small area of the frame. Emotion classification however proved to be accurate mostly for the emotion of happiness, which is the most obvious, derived from analyzing how much the subject is smiling.
Skin conductance value has been reported to vary a lot between persons being in relevant states, and detection of sudden emotional context change is noticed as sudden increase of the value compared to previous ones. This makes skin conductance also more suitable to be monitored on a short timeframe where we want to monitor player’s reaction to specific game stimuli.
47
The ultimate application of emotion recognition systems in story driven games, would be to develop virtual actors demonstrating signs of artificial emotional intelligence, by reading input from sensors while interacting with the user. As an example, Self City [93] is a previous project of the Waag Society on gamefying social-‐emotional skills learning. Self City transferred the player into a virtual city in which the player could train her social skills by interacting with other virtual avatars, in simulations of daily social life, and conflict scenarios. On these scenarios the player was called for example to deal kindly and calmly with an aggressive doorman, or someone who took his place at the tickets queue, or kindly ask another person for something. The player was guided by another avatar, her personal social skills advisor. Self City was designed on Second Life, an online virtual world in which users interact with each other through avatars. Behind all avatars of Self City there where educators interacting with the user.
Transferring game scenarios like those of Self City to a multi sensor interactive space, virtual actors could use emotion recognition systems to sense player’s emotions and trigger corresponding pre programmed behaviours. A virtual actor could enable emotion recognition at the beginning of an interaction, for example when the player is in a close range, sensing if the player is for example kind, smiling (facial expressions), talking calmly or using “please” (speech analysis/keyword recognition); or if the player looks angry or if she is scared (skin conductance). Monitoring speech and/or facial expressions, and keywords recognition, the virtual actor could detect the end of a phrase or a pause, and use the output of emotion recognition algorithms running, to trigger behaviours based on the story script, awarding the player, simulating empathy or act like it has been insulted or upset. Although current state of the above technologies may require from the player to overact her emotions, and delayed responses would create an unnatural flow in the interaction, considering the progress made for example on speech and action recognition systems during the last years, implementations of intelligent virtual actors will become more appealing and easier for interactive story telling. For more information on action recognition and intelligent virtual agents see: [94][95][96].
Closing this chapter, the table below [Table 1] presents a summary of all biofeedback mechanisms studied, and their main characteristics and constrains:
Biofeedback signals Emotions elicited Characteristics/Constrains Speech Anger
Happiness Surprise Sadness Disgust Fear
-‐ Expensive -‐ Performance depends on training -‐ Subject to bias
Facial Expressions Anger Happiness Surprise Sadness Disgust Fear
+ Determines displeasure or pleasure -‐ Requires a clear image of subject’s face
48
-‐ Performance depends on training -‐ Subject to bias
Body postures Anger Happiness Surprise Sadness Disgust Fear
-‐ Difficult to determine accurately -‐Performance depends on training -‐Subject to bias
Eye tracking Attention (eye movement), Arousal (pupil size)
+ Reliable -‐ Difficult to measure on a dynamic environment -‐ Difficult to determine displeasure or pleasure -‐ Expensive
Heart rate/ECG Arousal, Fear, Anger
+ Familiar, easy to measure + Cheap -‐ Lag between on set and stimuli -‐ Prone to movement artifacts
Skin conductance Arousal, Frustration, Surprise, Fear, Anger
+ Minimal lag to stimuli + Robust to movement + Easy to measure + Cheap -‐ Difficult to determine displeasure or pleasure -‐ Variable range across subjects
Body heat Arousal Anger Fear Cognitive load
+ Easy to measure + Cheap
Brain signals EEG Drowsiness Relaxation Alertness Attention
+ Determines cognitive load -‐ Noisy signals -‐ Prone to movement artifacts -‐ Raw values are hard to interpret
Table 1: Overview of emotion recognition modalities
49
Chapter 5: Hardware and software platforms for multi-‐sensor interactive spaces
5.1 Sensor Hardware Platforms
There is a very large number of companies producing sensors and offering specialized solutions for any nature of project. As final products, designed for a specific use though, these solutions often introduce restrictions on their application to custom setups and collaborating with custom written software. The architectural design of a project featuring multiple sensors, requires not only a sensor network that will make sure that all sensors work together without problems, but also a network that can be customized to fit the project’s data flow design. The use of sensor platforms complies with these two requirements offering a common standard base between sensors and the freedom to customize their function and connectivity. The following part presents some examples of sensor platforms used today, with different design approaches.
Arduino
Arduino is an open-‐source electronics platform. It is designed as a low cost, expandable, multi-‐purpose prototyping platform based on flexible, easy to use hardware and software. Since its introduction, Arduino has created a very large community, sharing support and code; it is used for education in a lot of laboratories around the world, and has become a standard for interactive designers, media artists, and hobbyists
The Arduino basic platform consists of three parts: The Arduino microcontroller, which can be built by hand using the provided schematics or purchased preassembled and in different versions and sizes, including versions designed to implement wireless nodes, with XBee* radio connector and circuitry for battery and charging; or version like the LilyPad, designed so it can be sewn onto fabric, for wearable applications. The Arduino microcontrollers are based on the Atmel 8-‐bit AVR family of microcontrollers with RISC architecture.
Second part of the platform is the language and compiler. Arduino’s language is based on C, and designed to simplify the creation of physical interaction application, in combination with the use of the third part, the IDE, which is built on Java. The three parts make a platform with simplified programming language, used to create instructions for a controller basic enough to be easily used for common programming tasks, yet powerful to support complex projects.
Arduino can be expanded with a great variety of add-‐ons, the Arduino shields as they are called, and a great variety of motion and environmental
50
sensors, network devices, servomotors, and can implement wireless sensors, tangible interfaces and robots.
*XBee is a ZigBee-‐enabled device for Arduino. ZigBee is a wireless communication standard, designed to be inexpensive, with low-‐power consumption. Most importantly ZigBee is particularly well designed for mesh networks, with peer to peer connections, instead of a single router network.
.Net Gadgeteer
Following Arduino’s success, Microsoft Research recently launched the .Net Gadgeteer open source platform, a microcontroller based on the ARM7 processor, designed to be programmed through the Microsoft’s .NET (Micro) Framework and C# and expand through solder-‐less connection modules. The idea of solder-‐less connection modules will encourage more people without any experience in building circuits, to try and build their own gadget prototypes. Since Gadgeteer is a very new platform, and since it uses its own connection standard, the list of available modules/sensors is still limited.
Phidgets
Phidgets is also a platform on the concept of Arduino, designed to be even simpler. Phidgets is a line of plug and play building blocks for physical computing that can be connected over USB to a computer and communicate with any application. The Phidgets API controls all the USB communication with the devices, making simpler the communication between applications. Arduino supports the creation of more complex projects, but Phidgets allows you to built simpler prototypes faster, and supports programming in a large variety of programming languages including high level languages like C# and Actionscript 3, as well as visual programming frameworks like Max/PureData and LabView.
Shimmer Shimmer is an open source platform for small, wearable, wireless sensors.
Shimmer started as a project of Intel Research and is now a division of Realtime Technologies. Unlike the previous hardware platforms presented, focusing on multi purpose prototype building, shimmer produces already assembled, highly sophisticated sensors, focusing more on research around Body(or Personal) Area Networks (B/P AN). BAN research aims at the development of wireless distributed systems for autonomous and remote monitoring of patients in health care.
Shimmer platform consists of the main unit, a light weight pack with an MSP430 processor, battery, Bluetooth and 802.15.4 connectivity, a micro SD memory slot for offline data storage, a tilt sensor and an accelerometer. A variety of motion, biophysical and ambient sensors can be connected with the unit. The firmware of the unit embeds TinyOS , a very light and highly customizable unix-‐based operating system, specially designed for low power embedded systems and sensor networks. Shimmer supports development of applications in C# and also a LabView library, although every unit is an autonomous node, providing
51
data in raw or semi processed format, accessible through all applications via custom libraries.
I-‐CubeX
I-‐CubeX is a commercial platform producing a variety of sensors and providing multiple sensor kits for research and interactive projects. I-‐CubeX provides an API with support for various languages like C++, Actionscript, Max/Jitter, while the sensors can communicate directly with musical keyboard instruments using the MIDI interface. On the platforms website there are a lot of application code examples and sensor kits suggested for a wide range of interactive applications categories.
5.2 Interactive software development platforms
This part of the document is a short presentation of various useful frameworks and toolkits for interactive application and data visualization programming. Although a lot of the frameworks, mentioned below, share a lot of common elements, this list serves two purposes. The first is to cover frameworks written in different languages so that the reader can find one that is written in a familiar language for him, or that serves better his projects requirements. The second purpose is to encourage the reader to visit and explore the websites of the tools mentioned, where previous work of very talented programmers and artists is showcased, often with available source code, being thus a great source of inspiration for anyone interested in multimedia programming and visual arts.
Processing (Java based) is an open source programming language and environment focusing graphics and interactions programming. Based on a very minimal environment, Processing was developed as a “software sketchbook” and a tool to teach fundamental computer programming for visual arts. Processing was the first of a series of frameworks that appeared during the recent years, wrapping a growing collection of standard libraries for graphics, image, video, audio manipulation, network libraries, physics engines and many more, offering also more simplified interfaces to all these libraries to make it simple to combine them inside a program.
After the success of Processing, openFrameworks (C++) was released, following the same concept, using C++ to deliver applications with better performance than processing and native C++ libraries, offering also the ability to develop native applications for the iOS and Android mobile platforms. openFrameworks has built a very large support community and it has been successfully used from mobile apps to large and complex interactive installations. Beyond the basic standard libraries wrapped by openFrameworks, users are constantly expanding the list of add-‐on libraries and components, including libraries to create tangible interfaces and physical interaction, like the TUIO and TouchLib libraries, and the OpenNI framework, which has already produced a few very interesting projects using the Kinect sensor. Cinder (C++)
52
and Polycode (C++/Lua) are also two other open source toolkits similar to openFrameworks.
Visual Programming Languages
Visual programming languages combine traditional coding with tools that allow the user to handle all components as blocks on a canvas. Each block has some kind of input signal, and the code inside the block determines its output. In that way the user control the flow of data inside a program by virtually wiring signals with blocks input/outputs. Apart from offering a clearer structure, by using this visual schematic, to people with no programming background, visual programming languages also focus more on live, or run-‐time coding, allowing to change the behavior of a block without requiring to recompile of the whole program.
The most popular visual programming languages are Max, developed by Cycling74, and PureData, its free open source equivalent, actually developed by one of the initial developers of Max, Miller Puckette. Max and PureData were particularly popular to musicians, since electronic music was one of the first fields utilizing digital technology and programming and this logic of dataflow programming, wiring different signals, effects and sensors was something that musicians were already familiar with from recording studios. Today both tools have a very large collection of patches and programming APIs to integrate different effects and sensors.
Isadora, developed by TroikaTronix, software branch of Troika Ranch, a media intensive dance company, is a visual programming language focusing mainly in manipulation of video and audio for live performances, supporting up to 6 different independent outputs, and including also a C++ SDK to develop custom filters and effects.
Field is a Python based open source toolkit, developed by OpenEndedGroup, a team of artists also with experience in interactive installations, and working on theatre and dance performances. Field includes a Processing plug-‐in which replaces the Processing IDE, and through which all Processing libraries can be used on Field. A program written in Field can include also code in other programming languages, including languages that execute inside other applications like Autodesk Maya and Adobe After Effects. Field supports only Mac and Linux platforms
VVVV is another new visual programming toolkit, free for non-‐commercial use, compatible with Windows platform only, using DirectX libraries and supporting programming in C#.
QuartzComposer is part of Apple’s XCode framework, for visual programming using native libraries of the MacOS.
53
Working with sensors
For working more particular with sensors, signal processing and pattern recognition, the most popular applications, offering both visual and traditional programming are LabView, by National Instruments, and Simulink, developed by MathWorks.
BioMOBIUS is an open platform, developed by an open community of researchers and by TRIL Centre, which allows researcher to rapidly develop sophisticated technology solutions for biomedical research. It was developed with the philosophy of providing a common technology platform, which comprises hardware, software, services and sensors. BioMOBIUS development environment is based on EyesWeb, and provide support for designing applications based on the Shimmer sensor platform.
Exemplar is an open source kit for programming of prototypes using sensors, developed by Stanford’s University, Human Computer Interaction Group. Exemplar is a plug-‐in written for Eclipse IDE, offering a GUI through which is possible visually monitor live sensors signals and manipulate them.
ROS (Robot Operating System) is an open source project providing libraries and tools like device drivers, message passing middleware, computer vision libraries, and more features to support the creation of robot applications. Since robots are an ensemble of sensors and motors, ROS features could also support the creation of a project utilizing a network of autonomous sensor nodes. Among other sensors, ROS now includes drivers and libraries for the Kinect sensor, which is a perfect solution for computer vision in low cost robot projects, and has already been used with very interesting results.
Result of the combination of ROS with the Kinect sensor is also the Point Cloud Library (PCL), sister project of ROS, including state of the art algorithms for 3D point cloud processing, including filtering, feature estimation, surface reconstruction and registration, model fitting and segmentation.
54
Chapter 6: A generic architecture for multi-‐sensory interactive systems
6.1 Architecture Description
As described in the introduction of this document [See Ch.1], the EPLT is meant to be an open platform to be used by developers, artists and researchers for the development, experimentation, testing and support of multi-‐sensor technologies applied on interactive applications. As such, the EPLT should feature a flexible, extendable and scalable architecture that can be adapted according to the application built upon it, and the equipment used for input and output of the interactions. A major characteristic of this architecture, included also in the problem statement of this thesis, is the existence of a common framework for the collection and process of data from the various wearable sensors used.
Based on the above requirements, formed by the EPLT project’s description, and after research conducted by the author, on previous related work of others, and development platforms for interactive applications [see Ch. 5.2], this chapter proposes a generic architecture scheme for multi-‐sensor interaction spaces. The basic elements of this scheme are shown in the figure below [Figure 6], composed by three basic levels. The world level corresponds to the actual sensors, such as a motion sensor, a heartbeat sensor and an EEG sensor, and the output generators of the systems such as a projector, speakers and lights.
Figure 6: A generic architecture for interactive spaces
55
The device level corresponds to low-‐level hardware and software responsible for the collection and transmission of data from the various sensors to the application, and the sub-‐systems controlling the output mechanisms used by the application. The application level corresponds to the system accepting data from the sensors as input and process them to the corresponding output.
Main characteristic of the proposed architecture is that the components composing the different parts of the device and the application level can correspond either to processes running on the same computer, or to processes running distributed over a network of computers, each one implementing a different part of the interactive system. In order to provide this scalability, a messaging service is established between the device and the application level, using the Open Sound Control (OSC) protocol.
OSC is a communication protocol originally developed at the UC Berkeley Center for New Music and Audio Technology (CNMAT), for communication among computers, sound synthesizers, and other multimedia devices that is optimized for modern network technology. OSC's advantages include interoperability, accuracy, flexibility, and enhanced organization, featuring open-‐ended, dynamic, URL-‐style symbolic naming, symbolic and high-‐resolution numeric argument data, pattern matching language to specify multiple recipients of a single message, high resolution time tags, and “bundles” of messages whose effects must occur simultaneously. OSC messages are usually transmitted over the UDP protocol. Due to its flexibility and simplicity, OSC has gained a lot of popularity and has been implemented on a growing list of programming languages and libraries, like the ones presented on the previous chapter, real time sound and media processing environments, software and hardware synthesizers, sound and light consoles and various tangible interfaces. Other messaging systems where considered during research, such as the Virtual-‐Reality Peripheral Network (VRPN)[97]. VRPN is a device-‐independent and network transparent interface to virtual-‐reality peripherals, supporting a wide range of controllers. VRPN offers similar functionality to OSC, but the second was preferred because of the wider range of applications supporting it, including as mentioned above, applications used by non-‐programmers.
In the proposed architecture the Input Control System is a set of processes reading data from the actual sensors, which can be connected in various ways, for example through USB, Bluetooth, RF, WiFi etc. and transmits these data values over OSC messages to the other components of the device and application level. Although the use of messaging on an application running on a single computer will introduce some additional load and latency to the system, it preserves a degree of independence between the device and the application level, increases the reusability of code to extend the system, establishes a basic framework to develop sound and media interactions without any additional code, and sets a communication standard which also allows someone to test the interactive system without using the actual sensor devices or having to go into low-‐level details of how the computer reads data from the sensor device.
56
6.2 Use Case
Following a general use case of the EPLT, which will demonstrate the architecture on a larger scale as shown on the previous figure [Figure 6], the user enters the installation space, where his first action is to place a set of bio-‐feedback sensors on his body. Body sensors are sensitive devices and must be placed correctly in order to get good signal quality and correct values. A member of the staff helps the user to place the sensors correctly and monitors the sensor signals on a laptop computer, which runs the Input Control System. When all sensors are placed correctly the staff member gives a brief description to the user of what and how the sensors measure, by showing the user a visualization of the data collected. The user then has to become comfortable with the sensors and to relax in order to enter the system on a neutral state. When the user is good to go, and everything is set correctly, the staff member can initialize the OSC connection between the Input Control System and the rest of the applications running distributed over the network, and launch the game engine running on another computer back stage. In the application level, apart from the game engine, there can be an application storing all bio-‐signal data during the session for later review. Independently from the output produced by the game engine, certain sensor values can be mapped directly to channels of another system with a sound console and a DMX controller, controlling the sound and lights of the space. For example a feedback sound effect of a heart beating, or the bpm of a soundtrack, can be directly synced with the heartbeat rate of the user, and a spotlight can follow the position of the user in space, changing brightness and color according to the attention and meditation levels of the user as they are captured by an EEG sensor. The use of OSC makes the system easy to extend, and allows someone, or a group of people, to design interactions using a variety of software that as a whole will create a rich and immersive interaction experience.
Having the Input Control System running on a different process or system from the application level makes it also easier for the system to handle and recover from errors. During a game session, a sensor might suddenly lose connection with the system or lose contact with the body of the player because of a sudden move or a jump, and resulting to incorrect values. Errors like that are easier to spot by someone monitoring the signal qualities and raw values of the sensors. In case of an error, the staff member can temporarily disable the transmission of sensor data to the rest of the system, leaving it to continue running based on a previous valid state, while trying to reset the connection with a particular sensor, or if the error is critical for the progress of the game, he can decide to pause the game, help the player to get a sensor placed correctly on his body, and resume the game.
An adaptation of the generic scheme presented here, is described in more detail on the next chapter, which describes the development of a small-‐scale prototype demonstrating the use of multiple sensors in a game environment.
57
Chapter 7: Prototyping a virtual board game with physical interaction
7.1 Introduction
This chapter describes the prototyping phase of the thesis. This phase focused on the development of a prototype, aiming at the following targets:
• Study of the characteristics of certain commercial sensors in a programming perspective and their performance on interactive applications
• Demonstrate a basic implementation of the architecture proposed in the previous chapter (Ch. 6)
• Develop a base that would allow the study of data collected from body sensors, to observe ranges of values, based on which more complex interaction can be designed, and their possible correlation with the concept of Embodied Learning
Main idea of the prototype, conceptualized by the author, was to use a motion capture sensor to create a board game that would blend traditional forms of children games with modern video games. Characteristics such as dynamic computer graphics, sound effects, fantastic virtual worlds, and the ability to play with someone over distance have made video games very exciting and engaging to children. On the other hand, traditional games that used to be more popular in the past, for example hopscotch [Figure 7], although they might seem rather simple for today’s hi-‐tech standards, motivated the physical exercise of children while offering a playful experience. Modern devices such as the Microsoft Kinect sensor, give us the ability to combine the best parts of both forms of game.
Figure 7: Hopscotch. A classic example of children game
58
7.2 Preliminary studies
Designing a board game based on the Kinect sensor first of all required to determine the maximum squared active skeleton tracking area for the board. The first version of a game prototype developed by the author, NumHop is a minimal game, built on C++, OpenGL, and the Microsoft Kinect SDK (beta1) using the cinder framework (see 5.2), as a preliminary study on Kinect’s active area and performance. In NumHop [Figure 9] the player was called to add numbers up to a certain number by jumping to numbered tiles of a 4x4 board that updates itself every 3 seconds. The faster the player reaches the target sum, the more points he gains, while the player loses a game round if he exceeds the targeted sum.
Kinect’s minimal distance for skeleton tracking is approximately 1.3 meters and the maximum approximately 3.8 meters. The sensor has a horizontal field of view of 57°. After testing the optimal maximum squared area was determined to be around 4 m2, starting from 1.5 m., to 3.5 m. away from the sensor. The diagram below [Figure 8] shows the scene model in OpenGL world space and camera model. According to this, the camera in OpenGL is placed on the positive z-‐axis, facing towards the negative z-‐axis, and the Kinect sensor is placed on the origin 0,0,0 point, facing the positive z-‐axis. The application uses all tracked joints to draw the skeleton and the position of the center of mass joint of the tracked user to determine the tile the player is standing. The player has to stand on a tile for at least one second to select it. On its graphical user interface (GUI) the application provides a control through which the position of the camera can be adjusted inside the OpenGL world space. By doing that, the game could also be played in an alternative setup where the virtual camera is placed above the board, and the game is projected on the actual floor.
Figure 8: Kinect board in OpenGL scene
59
NumHop was presented among other projects of the Waag Society institute, on an open day event of the Creative Learning Lab for educators, receiving a lot of good comments on the potentials of the Kinect technology and the prototype itself for educational use.
Figure 9: View of NumHop first prototype
Preliminary phase of prototyping continued by developing interfaces and small applications on cinder framework to test other sensor technologies, including the facial expressions analysis SHORE library, provided by the Fraunhofer research organization, the Mindwave EEG Bluetooth sensor by Neurosky (see 4.9), and the Zephyr HxM bluetooth heart rate sensor. Both Bluetooth sensors use serial-‐over-‐bluetooth communication protocol to transmit values to the applications. Wrappers developed on cinder for the two sensors where later used in the implementation of the “hardware” level of the next version of NumHop, described in the next section.
7.3 NumHopII -‐ The Game
Motivated by the concept of NumHop and the good comments the first sample received, the prototyping phase continued on building an enriched version of the game that would feature more interaction with the Kinect and additional biofeedback mechanisms, and that would cover the targets mentioned in the introduction of this chapter. The final prototype presented here uses the Kinect sensor, the Mindwave EEG sensor, and the Zephyr HxM ECG sensor.
The sensors used on the prototype where chosen among the ones discussed in chapter 3 and 4, based on their suitability for the specified game mechanics, and out of personal curiosity of the author to work with them. The Kinect sensor is the latest, state of the art device in motion capture sensors, with very good performance and easiness of use. Brain computer interfaces are a very new technology, at least in commercial level, with theoretically, very promising features. This characteristic of emerging technology adds novelty to the game, and the use of an indication of attention, and meditation level, is believed that will intrigue the player, and enhance the gaming experience. On the other hand, heartbeat rate has the value that (large enough) fluctuations are internally
60
sensed by the player. The visualization of the heartbeat rate, and the interaction based on it, creates another link between the physically sensed body, and the virtual environment, that will enhance the feeling of immersion. The combination of the two body sensors is believed to be a good base to further study potential relation between physical activity and mental state.
In the final version of NumHop [Figure 10], the player is placed on a virtual large hall. In front of the player placed on the floor, there is again a board of 16 numbered tiles. The rest of the scene contains 6 teleport chambers placed along the walls of the hall. The player is called to answer questions on simple multiplication matrices, for example the result of 6 x 7. The tiles of the board are numbered to values close to the correct result with at least one containing the correct number. The player then has some seconds to select his answer by stepping onto a tile. The faster the player responds correctly, the more points he gains. If the player does not respond the game moves automatically to the next question. If on the other hand the player selects a wrong answer, the board moves to the next question, and an enemy robot is teleported in the scene through one of the 6 chambers, and starts approaching the player with bad intensions. The player can defend himself against the robots by activating his superpower (activated by raising both hands above shoulder level) and aim it against the robots. The player starts the game with a certain level of superpower that it is reduced by use. When however the Mindwave sensor that player wears on his head, detects high level of attention, the superpower level starts to increase and the player can activate it again. If the player runs out of superpower, he has to suffer the robot’s hits, which reduce the player’s health level. If the player survives an attack he can step back for a moment and try to relax. When the Mindwave sensor detects high level of meditation, the health level of the player is increased. The player is given 3 lives in the beginning on the game, and bonus lives can be gained after a number of consecutive correct answers.
The GUI element representing the health level includes also the heart rate value in beats per minute (bpm), obtained by the Zephyr HxM sensor that the player is wearing. The heart rate value is not directly connected to any element of the game. Although there was the idea of correlating the heart rate to the update interval of the board, it was finally abandoned. There are two reasons for that. The first one is that because the heart sensor has to be placed under player’s clothes, it might be proved impractical to use on a school test environment. The second one is that designing a certain interaction based on heart rate requires to know in advance the expected range of values during the game, knowledge and expertise that was not available at the time of development. The presence of the hear rate value however was thought to be useful as also explained earlier, first for observation of the values for further use, and second to see how players respond to this information, if for example by placing the heart rate value of the player, appearing on the GUI, as what could be conceived by someone as another form of score points, motivates the player to raise his heart rate by moving more intensively.
61
Figure 10: View of NumHopII prototype scene
7.4 Architecture Overview
In order to make this final version of NumHop, using better graphics and more complex game play, the game was redesigned on the Unity3D platform. Unity3D is a game development platform that combines a visual editor and a programming environment, which simplifies the procedure of developing games for multiple platforms, providing a physics engine, and among others, easy to use tools to program animations, collision detection, and particle systems. Unity can be programmed using scripts written in C# or JavaScript. The prototype was developed by the author of this document, based on 3D models, textures, sounds and pieces of code found on Unity’s sample projects, tutorials and the Unity online community.
The prototype consists of two applications. The first one implements the sensors “device” level [see 6.1], and it is developed in cinder, and the second one is the game itself, developed in Unity3D. The first application, SensorsOSCTransmitter, communicates with the Mindwave and Zephyr HxM sensors and transmits the sensor values to the game in OSC messages. The application also features three oscilloscopes to monitor attention, meditation, and heart rate over time [Figure 12], and also the ability to store values in a log file. The log file contain values of: seconds since log started, attention, mediation, and heart rate (bpm), in columns, in simple CSV (Comma Separated Values) format, that can be easily imported in a spreadsheet application, for further study and statistical analysis.
Following an adaptation of the generic architecture described in the previous chapter, the Kinect sensor is directly connected to the game engine and not to the general input device level. This option is better for the particular project, because the game’s avatar movement is more responsive to player’s physical movement; it also gives the ability to have a preview screen of the sensors view inside the game, and improves the overall performance. The Kinect sensor communicates with the game through the OpenNI framework instead of the Microsoft Kinect SDK. Using OpenNI allows also the deployment of the game
62
to both Windows and Mac OS X platforms. The diagram below [Figure 11] represents the main components of the two applications forming the prototype, and the connections between them:
Figure 11: Prototype's architecture overview
63
7.5 Main components overview
This section provides a brief description of the main components and classes used by the two applications that form the prototype, starting from bottom to top, as they appear on the previous diagram [Figure 12].
SensorOSCTransmitter (C++ -‐ cinder)
Mindset is a cinder wrapper on the ThinkGear SDK provided by Neurosky. The class contains function to connect and read values from the Mindwave or Mindset EEG sensor using a serial over Bluetooth connection.
ZephyrHxM is a class containing function to connect and read values from the Zephyr HxM ECG sensor using a serial over Bluetooth connection.
ciOscilloscope is a class implementing a simple oscilloscope drawing the contents of a c++ double ended queue buffer. The height of the y axisis automatically adjusted between the minimum and maximum value observed.
OSCSender class implements the OSC over UDP communication, containing functions to construct OSC messages, bundles and send them to a specific IP Address and port. The class is contained in the OSC “block” built in cinder.
Figure 12: View of the SensorOSCTransmitter application generating a test signal for the
values of attention, meditation, and heart rate
64
NumHopII (C# -‐ Unity3D)
OscService object uses C# OSC and UDPPacketIO libraries and contains the OSCReceiver script that starts a thread listening for OSC messages on a specific port, and initializes two OSCListeners one for each sensor. OSC messages sent from SensorOSCTransmitter have the following format:
/sensors_transmitter/(sensor_name)/(valueName)/(float value)
Mindwave OSCListeners listens for messages with address prefix:
/sensors_transmitter/signal (Signal quality of mindwave sensor),
/sensors_transmitter/attention,
/sensors_meditation,
and HeartbeatSensor OSCListeners for messages with address prefix
/sensors_transmitter/heartrate
When a message with one of those addresses field is received, the OSCListener calls the corresponding set value function of PlayerStatus script.
KinectSensor is the object containing all necessary scripts to communicate with the Kinect sensor via USB, using the OpenNI framework developed by PrimeSense (Wrapper scripts are based on an older version of zigfu.com wrappers for Unity, no longer available online) . Among other scripts the object contains the OpenNISingleSkeletonController, which is activated after a user has been calibrated by the Kinect, by standing in the “Y” pose indicated initially by the player’s avatar. OpenNISingleSkeletonController updates joint positions by calling the corresponding function in OpenNISkeleton of the player object. The object also contains the OpenNIPostureDetector script, in which the program compares current hand, elbows and shoulders joints positions. If the script detects the preset posture (both hands raised above shoulders level) it activates the Superpower script of the Player. OpenNIDepthmapViewer renders a small preview display of the sensors depth video feed.
Player besides the 3d avatar model, contains the OpenNISkeleton script in which joints from the Kinect sensor are mapped to joints of the 3d model, along with various variables controlling the behavior of this mapping, like offsets, scale of transformations, damping etc. PlayerStatus script contains all information about the player, including game play values like score, lives remaining, health status, superpower level, and values received by the body sensors. When a value is updated the PlayerStatus script calls function to update elements of the Game GUI. Player object also contains a CharacterController script, which creates a capsule collider for the player’s avatar through which the player interacts with the game’s board. The SuperPower script activates the rendering of the superpower lighting effect particle system starting from the player’s hands and extending to the position of the target object, found on player’s model tree,
65
followed by a ray cast collision test, that calls the ApplyDamage function of the EnemyDamage script of the Robot it collides with.
GameGUI contains an orthographic camera, additional to the game’s perspective main camera, used to render the graphical user interface of the game. The GUI consists of the player’s HUD (Head Up Display), and a GUI for changing settings of the game during runtime. The game HUD is consisted by a text texture showing the current score, four animated circles indicating current attention, meditation, superpower level, and health/heart rate/ lives remaining values, and a text texture that display’s the current question of the board to the player.
The game setting panel, handled by the GameSettingsGUI script, is activated by pressing the “p” key. This panel includes various game play parameters, like the boards update interval in seconds, player’s maximum superpower and health level, and parameters to adjust Kinect’s skeleton tracking behavior, like skeleton smoothing, skeleton offset and transformation scale that sometimes have to be adjusted depending on the position of the sensor and the environmental noise, and a toggle for the Depthmap Viewer (having the DepthMap viewer activated all the time reduces game’s performance).
Board contains 16 tile objects. Each tile contains a text texture containing a number, a box collider and a ButtonManager script. BoardManager script handles the update of the board, and checks player’s answers. BoardManager starts by setting a question to the player, by choosing two random numbers between 1 and 9, and then updates a random number of tiles around the right answer, making sure at least one contains the right answer. When the player steps on a tile, the tile starts being pressed, and when it reaches its fully pressed position (duration of press is adjustable) calls the BoardManager to check the tile’s value against the correct answer.
The function of BoardManager that check’s the player’s answer, handles basic rules of the game play. In the simplest case that the player’s answer is wrong, the script calls the RobotLauncher object to launch another robot attack to the user. If the answer is correct, the player is awarded with points depending on the time of response (the faster response the more points). The script keeps track of consecutive correct answers, based on which bonus rules can be applied, for example to fill the player’s health level after five consecutive correct answers or award the player with a bonus life. Similarly the function includes a sensor fail, fallback game mechanism, according to which if the game detects that the EEG sensor, on which the player’s superpower and health levels depend, has bad quality signal, the game raises the superpower level on every correct answer, in order to keep the play running and not let the player totally vulnerable.
RobotLauncher contains a path from every teleport chamber in the level, to the level’s board and the RobotSpawn script that initializes the robot attacks. When the player chooses a wrong answer, RobotSpawn is called, initializing a Robot object (prefab in Unity’s terminology) inside one of the chambers. After initialized, the robot follows the corresponding path to the board, and when it reaches its end, follows the player to his current position and starts hitting him (see RobotBehaviour script in projects code).
66
Chapter 8: Evaluation results and conclusions
8.1 Prototype Game Evaluation
This chapter discusses results and conclusions from testing and evaluation of the final prototype presented in the previous chapter, as well as ideas for further development and study. The prototype was tested in a series of private sessions and an open evaluation session that took place at the Theatrum Anatomicum, at the Waag Society, with participants from people working for the institute. Overall 11 people tested the prototype, from which 4 people played the game using only the Kinect sensor, 5 people using the Kinect and Mindwave sensor, and three people using all 3 sensors. The evaluation data where collected either by a questionnaire (see Appendix I) or by short interviews and discussion with the participants after they played the game. In all sessions, both applications of the prototype where running on a single computer, with a dual screen (monitor + projector) setup [Figure 13]. Overall the system proved to be robust, and no major flaws where found that would cause the system to crash, besides some cases that there was an error on the bluetooth connection between the operating system and the sensors. The advantage of having a separate application handling the communication with the two body sensors is that even in those cases of connection error, the game was not interrupted, and only the sensors transmitter application had to be restarted in order to re-‐establish the bluetooth connection. During the game play, only one bug was found, appearing randomly, and yet not fixed, believed to be caused by a racing condition between the animating engine of the board tiles (the push down/release animation), and the updating function of the board's numbers, triggering a board update sooner than the defined time interval, and as soon as the player stepped on a tile. In some cases this bug was frustrating for the player because he would step on a correct number, and until the tile was fully pushed, it would change to another number and question, probably making that tile a wrong answer to the new question.
Figure 13: Game test session at Theatrum Anatomicum - Waag Society
67
Microsoft Kinect Below we will discuss test results for each sensor in the game, starting with the motion capture Microsoft Kinect sensor. The Kinect sensor worked very reliably during the test, and there were no cases where the sensor stopped tracking the player, even when she would step out of the field of view of the sensor for a moment and then step back in. A negative point of the game's GUI is that when a user goes out of the field of view of the sensor, the game does not provide any indication to the player to move back in to view, apart from the fact that the avatar stops responding to users movement. Another negative point is that the avatar movement is different than that found on commercial XBox games. Although some smoothing techniques, found on the software used (OpenNI), were applied, lack of image noise reduction optimization, and lack of physical human kinetic motor models applied to the avatar, make the avatar being shaking even if the player is completely steady, as a result of image noise, or the avatar to take a physically impossible pose, usually when some of the player's joints are out of the field of view of the sensor, when the player moves close to the edges of the board. The second major problem regarding the avatar and the Kinect, found on earlier testing, and partially solved on final version of the game, has to do with the superpower beam directivity and the physical position (height and angle of view) of the sensor. In order to make this problem easier to understand for the reader, we first have to explain a little more on how the superpower works inside the game. The player's avatar model hierarchy includes an object named Target. Target is just a point in 3d space, placed approximately in the player’s chest height, and 50 units in front of the player. When the superpower is activated, the graphics engine renders the superpower particles starting from the hands of the player to the target position, and the physics engine casts a ray from the spine of the player to the target, looking for any robot enemy to hit in between. In order for the target to always be towards the current local forward vector of the player, the target follows the rotation of the spine joint, as determined by the Kinect. In other words, target is always 50 units in front of where the spine of the player looks. This rotation is determined according to the position and view of the sensor. Specially for the joint rotation on the x axis (pitch angle) this causes the problem that for a given player pose, the angle of rotation varies a lot depending on the height that the sensor is placed and its angle, having as a result that if for example the Kinect is placed on a position, on a height lower than the players spine height, so that it looks at the player from below, the superpower beam would end up aiming at a higher than the desired level, towards the ceiling of the level. In the same way, for a given position of the sensor, that behavior varies depending on the height of the player. Although an attempt for an optimal solution of this geometric problem, was made using Kinect's internal accelerometer and floor determination functions, results where not reliable enough, leading to the decision to finally overcome that problem by just ignoring the spine's pitch angle for the target. As a result even if the player
68
leans forwards or backwards, the target of the superpower beam stays on a given height, on a fixed angle. During the tests the players where not explained in detail how the superpower beam works. The only tip given was that it can be activated by raising both hands. Although most players got familiar easily with aiming the beam using their body, some of them where trying to aim only using their hands without turning their body, which is normal since in the game it appears that the beam starts from the hands of the player and not his chest, as it actually is inside the physics engine. Another problem mentioned by participants is that when a robot approaches very close to the target it might end up behind the player or in an angle larger than 90 degrees on the y axis, so the player cannot defend himself because either the Kinect will not detect that the player has raised both his hands (having a profile view of the player), or that of course if they turn their back, they will not be able to see the screen. The solution to that problem is usually for the player to take a step back and aim again (supposing that she would still be in the boards limits), but again this is not a physical response if you would have an enemy on your back in the physical world. Apart from that problem, the beam follows smoothly players body, aiming the beam takes some time to master, but after all it offers an opportunity for the player to master a skill inside the game, which is part of the gaming experience. The activation method of raising both hands appeared to work good for the game, being easy to understand both for the player and the sensor, not very strict to limit players natural movement, and in most cases it was recognized even in minimum distance from the sensor, provided that the player is not so tall (<2.0 m) that in the minimum distance, his shoulders are already out of the field of view of the Kinect. Finally, although the board area is quite limited, and every tile is one step away from its next one, some participants mentioned that the lack of reference points for the tiles on the floor can be confusing at the beginning. In order to overcome this problem, a more advanced and immersive setup can be used, in which another application runs the board mechanism, using a second projector to project the board, mapped to the actual floor. Alternatively the current prototype can be modified having a first person camera perspective, to be played using a head mounted display. Although theoretically head mounted displays provide the total immersion experience of virtual reality, these displays have been proved to limit player's freedom of movement, first because all currently available solutions require a wired connection to the computer, and secondly because they all lack the performance abilities to be responsive to physical speed of movement, requiring the player to adapt his movement to the capabilities of the screen.
69
Neurosky Mindwave
The Mindwave sensor also proved to work good as a piece of hardware for the game. It was rated as quite comfortable to wear, it was easy enough to get the perfect signal, required by the sensor for the attention and meditation level determination to work, and maintained it without problems through the game. In some cases the cases would lose signal after a jump, but it regained it shortly, without the player stop moving. The metaphor of connecting attention level to the superpower level, and the meditation to health level was rated very high as a concept, the effectiveness of the sensor inside the game though, did not receive very high ratings. The attention and meditation values usually stayed on an average neutral level through the game, and in a lot of cases we had to cheat by adding superpower level by the keyboard, in order not to spoil the fun. If a player dedicates more time to master the sensor, results will get better, but spontaneously during the game, the calculations did not seem to raise the attention levels that much for the sensor to play its role on the game very effectively. It should also be noted though, that all participants in the evaluation were adults, and basic multiplication matrices are not so challenging for them. This leaves an open possibility that for young children, for who the prototype is designed for in the first place; these calculations might be a heavier mental load, thus more easily detected by the sensor, raising the effect of the sensor in game play. In any case the use of the sensor certainly adds some excitement and curiosity about the game, since most people are not familiar yet with brain computer interface devices, and they want to try and learn how they work. All participants that played the game without using the mindwave sensor, were very curious to try it with the additional sensor, and believed that this would certainly add more fun to the game.
Zephyr HxM
As mentioned also in the previous chapter, the heart beat rate was not given an actual role on the game, because it was believed that the use of the heart sensor might be problematic during an evaluation session, basically because the player has to wear it under his shirt and it also has to be a little moisturized to increase its conductivity with the skin. Indeed given the option, most participants chose not to use the heart sensor (5 out of 8). Nevertheless, the ones who used the sensor rated it as very comfortable to wear, and the sensor worked almost perfectly through the game. Since that session was made to test the game and the sensors, participants who used the heart sensor were monitoring their heart beat rate, and felt motivated to try and raise it, in order also to test the responsiveness of the sensor and their condition. Again the integration of the heart sensor on the game is a first step to gather and study heart rate values and ranges, in order to get some knowledge based on which other interactions and game dynamics can be developed. Additionally new trends in the use of body sensors in daily life push the technology around sensors and we already have samples of sensors embedded to ordinary accessories, like for example a wrist watch with a heartbeat sensor. Devices like these will make the integration of biofeedback sensors to games easier and more practical to use.
70
8.2 General conclusions and further development
Overall the prototype presented was rated as a fun gaming experience. Especially participants that had never played a Kinect game before were very excited by their first experience with this technology. The prototype itself, and the general concept of motion-‐based board games was believed to have high potentials for educational games. As mentioned in the previous chapter, the game's graphics were taken for freely available resources on the Unity game developing community. This leaves a lot of room for changes and improvements to make the game environment more suitable and pleasant for younger children. The concept of virtual worlds in a board game can be expanded by introducing additional motion based interactions. For example, the board could be designed additionally as a floating platform, kind of a flying carpet, which the player can navigate through space depending on his position on it, called to follow a track path and encounter enemies on his way to finish the level. On a more permanent setup, taking advantage of the general architecture proposed in chapter 4, the feeling of immersion inside the games world could be enhanced by introducing additional interactions with vision and sound, for example in the prototype presented, the hall could be lighted with intense red light whenever the player gives a wrong question, and a robot attack is launched, or intense blue lights whenever the player's lighting superpower is activated. In a similar way, the game could change the color of the lights when necessary, to help the player relax for a while, in order to charge his health level and reduce his heart beat level. An element that would certainly raise the social affordances of the game would be the ability of a multiplayer game. Although technically it is possible to use a single Kinect sensor to track two players, practically the active area of tracking leaves little space to have two players on a board without crashing to each other. Using multiple Kinect sensors though, or alternative technologies like the Panasonic D-‐Imager or others, presented in chapter 3, that extend the area of tracking, allow to design scenarios in which the players either compete each other on the same task, or even more sophisticated game play on which two or more players are called to synchronize their moves in order to achieve a common goal, creating also a more interesting and fun scene for people watching them play. The prototype developed is using one motion capture sensor and two body sensors, one of which did not have a specific role in the game play. After a short presentation about the sensors and the game, and before playing it, some people did not understand exactly how the mindwave sensor values are used inside the game. Apart from putting the writer's presentation skills into question, this fact could also mean that if the game was using more sensors, with more complex game dynamics, and if it was presented to young children, the game interactions could be hard to understand. On the other hand, all computer games require from the player to invest some time playing in order to discover all the game's mechanisms.
71
As wearable sensor technology develops, making them more practical to use, the greatest challenge for a multi-‐sensor game designer is to create a meaningful ambient interaction layer, through which the player will discover and experience the game's mechanisms while playing, rather than require an explanation in advance. Models derived from previous research on emotion recognition systems, like those presented in chapter 4, combined with study of sensor data collected during game play, can assist on reaching this level of meaningful interaction.
Aim of this interaction is to create a playful experience that will assist the player to develop his self-‐awareness, as well as others awareness, through the opportunities of social interaction that the game set creates. This awareness is a step towards creating stable foundations that will support children to communicate and maximize their learning abilities. At the same time, games like that could be a valuable tool for educators to assess children, offering a better insight to each one of them, helping them to understand and highlight difficulties of children, and assisting them on giving more personalized guidance.
8.3 Summary of research results
This section is revisiting the research questions of the thesis, as defined in section 1.2, offering a summary of the research results that attempt to answer those questions, and analyze the research problem.
The first research question was if physical interaction can be combined with a virtual environment to enhance a playful gaming experience. The answer to this question is a definite yes. Chapter 2 analyzes, by reviewing literature and results of previous research, how physical interaction through the use of motion capture controllers and body sensors enhances values of the gaming experience, such as immersion, transformation, and agency, as defined by Murray [1997]. Focusing primarily on educational games, the research additionally examines the potential effect of the use of sensors in the creation of a playful learning experience, on the aspects of the involvement of human body and motion in the process of learning, and recall of knowledge (embodied learning), and on assisting the development of basic social emotional competencies, by significantly enhancing the social affordances of games. The prototype presented in chapter 7, acts as a positive confirmation of the research question, providing a basic example of multi-‐sensor based physical interaction game, which as the evaluation results showed, delivers a novel playful gaming experience.
The second research question was what sensor technologies are most applicable for enhancing a playful gaming experience inside the Embodied Playful Learning Theater installation. The thesis attempted to answer this question by presenting a range of available technologies of motion capture
72
systems [Chapter 3], and body/bio-‐feedback sensor systems [Chapter 4], studying their technical characteristics, focusing on the practical use of those systems in a gaming installation. This research question is hard to be answered definitely because theoretically, all the presented technologies are applicable and depending on the designed interactions, are able to enhance the gaming experience. The thesis provides more in depth details regarding technologies that the limited research resources allowed to be tested, more specifically regarding the Microsoft Kinect motion capture sensor, the SHORE facial expression analysis library by Fraunhofer institute, three different models of brain computer interfaces (Emotiv, Neurosky Mindset and Mindwave), the Zephyr HxM heart sensor, and a Tobii desktop eye tracking system.
Regarding motion capture systems, there was a preference towards marker-‐less systems, because the use of markers in combination with body sensors would create a rather complex setup that has to be worn by the player and calibrated before he can start playing. Regarding biofeedback mechanisms, after studying their characteristics, the research attempts to make a more systematic approach towards designing sensor-‐based interactions for games, by making a classification of sensor data into two categories: i) signals that are more suitable to be continuously monitored during a game, directly related to game mechanics, and ii) signals that are more appropriate to be sampled on an event basis interaction, connected to certain points of the game flow. This classification helps the interaction designer to choose what technologies are more applicable based on what game mechanics he wants to achieve.
Besides the two research questions, the thesis examined the design of a multi-‐sensor interactive system from a technical system architecture perspective, as defined in the research problem statement (see 1.2). Main characteristics of this system are expansibility, to be able to support the use of a number of different sensors, and openness, to be able to support the development and deployment of different applications on it. After considering a number of sensor hardware platforms, and interactive software development frameworks [Chapter 5], as well as previous sensor based interactive systems, the thesis proposes a basic, potentially reference system architecture [Chapter 6]. Main feature of this architecture is the design of separating the main interactive application, from a common device level that is responsible to connect and collect data from the various sensors. According to the design the two levels are communicating through Open Sound Control messages, an open standard that was found to be supported by a very wide range of reviewed interaction development frameworks and applications. This concept design was then implemented during the development of the final game prototype, which apart from the game application, features a separate application to connect with the two body sensors used, and transmit data to the game through OSC. This application was written in modular, cross-‐platform, C++ language, and it can be used to serve other applications compatible with the OSC standard, and can also easily extend to support the use of more sensors.
73
74
Appendix I
NumHop Game Evaluation Questionnaire
1. Have you ever played a game using the Kinect sensor before?
Yes / No
2. How would you rate the overall gaming experience?
Very boring Very exciting
❏ ❏ ❏ ❏ ❏ 3. Which game element you liked the most and why?
4. Which game element you did not like or found frustrating, and why?
5. How would you rate the interaction with the Kinect sensor (player movement)?
Unresponsive - Frustrating
Very responsive - physical movement
❏ ❏ ❏ ❏ ❏
6. How comfortable was the Zephyr heart sensor to wear?
Very Annoying
Very Comfortable
❏ ❏ ❏ ❏ ❏
75
7. How comfortable was the Mindwave EEG sensor to wear?
Very Annoying Very Comfortable
❏ ❏ ❏ ❏ ❏ 8. How would you rate the effectiveness of the Mindwave sensor to the gameplay?
Not effective at all
Very effective and fun
❏ ❏ ❏ ❏ ❏
9. Do you have any ideas on how the heart sensor could be used effectively in the game play?
10. How would you rate the speed of the game play (board update interval, robot attack speed etc.)
Very slow Very fast
❏ ❏ ❏ ❏ ❏ 11. Do you believe that a game like NumHop has potential in education / school environment?
Not really Certainly
❏ ❏ ❏ ❏ ❏ 12. Other Comments/Suggestions:
Thank you for your participation!
76
Bibliography
Joshua Noble (2009). Programming Interactivity. Sebastopol (U.S.A.): O’Reilly Media
Dan O’Sullivan and Tom Igoe (2004). Physical Computing. Boston (U.S.A.): Thomson Course Technology
Steve Dixon (2007). Digital Performance. Cambridge (U.S.A.): The MIT Press
Janet H. Murray (1998). Hamlet on the Holodeck: The future of Narrative in Cyberspace. Cambridge (U.S.A.): The MIT Press
References
[1]: Barron, B., Cayton-‐Hodges, G., Bofferding, L., Copple, C., Darling-‐Hammond, L., & Levine, M. (2011). Take a Giant Step: A Blueprint for Teaching Children in a Digital Age. New York: The Joan Ganz Cooney Center at Sesame Workshop.
[2]: Piaget, J. (1973). To understand is to invent: The future of education. Grossman Publishers, New York.
[3]: Malone, T.W. and Lepper, M.R. (1987). Making learning fun: A taxonomy of motivations for learning. In Snow, E. and Farr, M. eds. Aptitude, learning, and instruction: Cognitive and affective process analyses, Lawrence Erlbaum, Hillsdale, N.J.
[4]: Melissa Gresalfi, Sasha Barab, Sinem Siyahhan, and Tyler Christensen. Virtual worlds, conceptual understanding, and me: designing for consequential engagement. On The Horizon -‐ The Strategic Planning Resource for Education Professionals, 17(1):21‚34.
[5]: Gee, J. P. (2003). What Video Games Have to Teach Us About Learning and Literacy. New York: Palgrave Macmillan.
[6]: Barab, S.A., Sadler, T., Heiselt, C., Hickey, D. and Zuiker, S. (2007), ‘‘Relating narrative, inquiry, and inscriptions: a framework for socio-‐scientific inquiry’’, Journal of Science Education and Technology, Vol. 16 No. 1, pp. 59-‐82.
[7]: Balasubramanian, N., & Wilson, B.G. (2006). Games and simulations. In C. Crawford et al., (Eds.), ForeSITE, Vol. 2005, Proceedings of Society for Information Technology and Teacher Education International Conference 2006. Chesapeake
[8]: Hake, R., “Interactive-‐Engagement vs. Traditional Methods: A Six-‐Thousand-‐Student Survey of Mechanics Test Data for Introductory Physics Courses,” American Journal of Physics, Vol. 66, No. 1, 1998, p. 64.
[9]: Kress, J. S., & Elias, M. J. (2006). School based social and emotional learning programs. In K. A. Renninger & I. E. Sigel (Eds.), Handbook of child psychology: Vol.4. Child psychology in practice (6th ed., pp. 592-‐618). Hoboken, NJ: John Wiley and Sons.
[10]: Raver CC. Emotions matter: Making the case for the role of young children’s emotional development for early school readiness. SRCD Social Policy Report 2002; XVI(3): 3-‐18.
[11]: Goleman, D. (1995). Emotional intelligence. New York: Bantam Books.
[12]: Barnett LA, Storm B. Play, pleasure, and pain: The reduction of anxiety through play. Leisure Sciences 1981;4(2):161-‐175.
[13]: Calvert, S. L. (2005). Cognitive effects of video games. In: J. Raessens & J. Goldstein (eds.), Handbook of Computer Game Studies. Cambridge, MA: MIT Press, pp. 125-‐131.
[14]: Gunter, B. (2005). Psychological effects of video games. In: J. Raessens & J. Goldstein (eds.), Handbook of Computer Game Studies. Cambridge, MA: MIT Press, pp. 145-‐160.
[15]: De Kort, Y.A.W., and Ijsselsteijn, W. A. 2008. People, places, and play: a research framework for digital game experience in a socio-‐spatial context. ACM Comput. Entertain, 6, 2, Article 18 (July 2008), 11 pages.
[16]: Lazzaro, N. (2007). Why We Play: Affect and the Fun of Games: Designing Emotions for Games, Entertainment Interfaces and Interactive Tools. In: The Human-‐Computer Interaction Handbook: Fundamentals, Evolving Techniques, and Emerging Applications, Edited by A. Sears and J.A. Sears. Lawrence Erlbaum Associates,
[15]: De Kort, Y.A.W., and Ijsselsteijn, W. A. 2008. People, places, and play: a research framework for digital game experience in a socio-‐spatial context. ACM Comput. Entertain, 6, 2, Article 18 (July 2008), 11 pages.
[16]: Lazzaro, N. (2007). Why We Play: Affect and the Fun of Games: Designing Emotions for Games, Entertainment Interfaces and Interactive Tools. In: The Human-‐Computer Interaction Handbook: Fundamentals, Evolving Techniques, and Emerging Applications, Edited by A. Sears and J.A. Sears. Lawrence Erlbaum Associates,
77
Mahwah, New Jersey, 2nd Edition, pages 679 – 700.
[17]: Jansz, J., & Martens, L. (2005). Gaming at a LAN event: the social context of playing video games. New Media & Society, 7 (3), 333-‐355.
[18]: Bryce, J., & Ruttter, J. (2003). The Gendering of Computer Gaming: Experience and Space. In S. Fleming & I. Jones, Leisure Cultures: Investigations in Sport, Media and Technology, Leisure Studies Association, pp.3-‐22.
[19]: Carr, D., Schott, G., Burn, A., & Buckingham, D. (2004). Doing game studies: A multi-‐method approach to the study of textuality, interactivity, and narrative space. Media International Australia incorporating Culture and Policy, No. 110, 19-‐30.
[20]: Baumeister, R. F. & Leary, M. R. (1995). The Need to Belong: Desire for Interpersonal Attachments as a Fundamental Human Motivation. Psychological Bulletin, 117 (3), 497-‐529.
[21]: Ramanathan, S., & McGill, A. (2008). Consuming with others: Social influences on moment-‐to-‐moment and retrospective evaluations of an experience. Journal of Consumer Research, 34.
[22]: Raghunathan, R., & Corfman, K. (2006), “Is Happiness Shared Doubled and Sadness Shared Halved? Social Influence on Enjoyment of Hedonic Experiences,” Journal of Marketing Research, 43 (August), 386–94.
[23]: Jakobs, E., Fischer, A., & Manstead, A. (1997). Emotional experience as a function of social context: The role of the other. Journal of Nonverbal Behavior, 21 (2), 103-‐130.
[24]: Siân E. Lindley, James Le Couteur, and Nadia L. Berthouze. 2008. Stirring up experience through movement in game play: effects on engagement and social behaviour. In Proceedings of the twenty-‐sixth annual SIGCHI conference on Human factors in computing systems (CHI '08). ACM, New York, NY, USA, 511-‐514.
[25]: Bianchi-‐Berthouze, N and Whan, WK and Patel, D (2007) Does body movement engage you more in digital game play? And why? In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp. 102 -‐ 113
[26]: Malone, T.W.: What makes computer games fun? Byte 6, pp. 258–277 (1981)
[27]: Lazzaro, N.: Why we play games: Four keys to more emotion without story. Technical report, XEO Design Inc (2004)
[28]: Shusterman, R. (1992) Pragmatist Aesthetics. Living Beauty, Rethinking Art. Blackwell. [29]: Marianne Graves Petersen, Ole Sejer Iversen, Peter Gall Krogh, and Martin Ludvigsen. 2004. Aesthetic interaction: a pragmatist's aesthetics of interactive systems. In Proceedings of the 5th conference on Designing interactive systems: processes, practices, methods, and techniques(DIS '04). ACM, New York, NY, USA, 269-‐276.
[30]: Dourish, P. (2001). Where the action is: The foundations of embodied interaction. Cambridge, MA: MIT Press.
[31]: Wilson, M. (2002). Six views of embodied cognition. Psychonomic Bulletin & Review, 9, 625-‐636.
[32]: Barsalou, L. W. (2008). Grounded Cognition. Annual Review of Psychology, 59, 617-‐645.
[33]: Glenberg, A. M., & Kaschak, M. P. (2002). Grounding language in action. Psychonomic Bulletin and Review, 9, 558-‐565.
[34]: Rizzolatti, G., & Craighero, L. (2004). The mirror neuron system. Annual Review of Neuroscience, 27, 169-‐192. [35]: Hostetter, A. B., & Alibali, M. W. (2008). Visible embodiment: Gestures as simulated action. Psychonomic Bulletin & Review, 15, 495-514. [36]: Johnson-‐Glenberg, M. C., Birchfield, D., Savvides, P. & Megowan-‐Romanowicz, C. (2010) Semi-‐virtual Embodied Learning – Real World STEM Assessment. In L. Annetta & S. Bronack (eds.) Serious Educational Game Assessment: Practical Methods and Models for Educational Games, Simulations and Virtual Worlds. pp. 225-‐241. Sense Publications, Rotterdam.
[37]: Ekman, P. (1972) Emotion in the Human Face, Pergamon Press Inc., New York, USA. [38]: Christine Lætitia Lisetti and Fatma Nasoz. 2004. Using noninvasive wearable computers to recognize human emotions from physiological signals. EURASIP J. Appl. Signal Process. 2004 (January 2004), 1672-‐1687.
[39]: Picard R. W. 1995 “Affective Computing”, MIT Technical Report #321, Cambridge, MA, USA
[40]: Boehner K., DePaula R., Dourish P., Sengers P. 2005 ”Affect: From Information to Interaction”, Proceedings of the 4th decennial conference on Critical computing: between sense and sensibility, Aarhus, Denmark, ACM Press
[41]: Florian 'Floyd' Mueller, Darren Edge, Frank Vetere, Martin R. Gibbs, Stefan Agamanolis, Bert Bongers, and Jennifer G. Sheridan. 2011. Designing sports: a framework for exertion
78
games. In Proceedings of the 2011 annual conference on Human factors in computing systems (CHI '11). ACM, New York, NY, USA, 2651-‐2660.
[42]: Crews, D. J. & Landers, D. M. (1993). Electroencephalographic measures of attentional patterns prior to the golf putt. Medicine & Science in Sports & Exercise. 25(1), 116-‐126.
[43]: Pope A., Stephens C. (2011). “Movemental”: Intergrading Movement and the Mental Game. Workshop Paper form CHI 2011 Workshop “Brain and Body Interfaces: Designing for Meaningful Interaction”. Available on: http://physiologicalcomputing.net/bbichi2011/Movemental Integrating Movement and the Mental Game.pdf
[44]: Ramesh Raskar, Hideaki Nii, Bert deDecker, Yuki Hashimoto, Jay Summet, Dylan Moore, Yong Zhao, Jonathan Westhues, Paul Dietz, John Barnwell, Shree Nayar, Masahiko Inami, Philippe Bekaert, Michael Noland, Vlad Branzoi, and Erich Bruns. 2007. Prakash: lighting aware motion capture using photosensing markers and multiplexed illuminators. In ACM SIGGRAPH 2007 papers (SIGGRAPH '07). ACM, New York, NY, USA, , Article 36 .
[45]: Takaaki Shiratori, Hyun Soo Park, Leonid Sigal, Yaser Sheikh, Jessica K. Hodgins "Motion Capture from Body-‐Mounted Cameras" ACM Transactions on Graphics, Vol. 30, No. 4 (Proc. ACM SIGGRAPH 2011), July 2011 [46]: A. Laurentini (February 1994). "The visual hull concept for silhouette-based image understanding". IEEE Trans. Pattern Analysis and Machine Intelligence.. pp. 150–162 [47]: Corazza S., Mündermann L., Andriacchi T., A Framework For The Functional Identification Of Joint Centers Using Markerless Motion Capture, Validation For The Hip Joint, Journal of Biomechanics, 2007. [48]: L. Xinghan, B. Berendsen, R.T. Tan, R.C. Veltkamp, Dept. of Inf. & Comput. Sci., Utrecht Univ., Utrecht, Netherlands. Human Pose Estimation for Multiple Persons Based on Volume Reconstruction. In: Proc. 2010 20th ICRP. IEEE, 2010, pp 3591-‐3594. [49]: Rosenhahn, B., Brox, T., Kersting, U. G., Smith, A. W., Gurney, J. K., & Klette, R. (2006). A system for marker-‐less motion capture. Main, 1(1), 45-‐51. Citeseer. [50]: http://www.eyewriter.org [51]: J. Shotton, A. Fitzgibbon, M. Cook,T. Sharp, M. Finocchio, R. Moore,A. Kipman, A. Blake. Real-‐Time Human Pose Recognition in Parts from a Single Depth Image. Microsoft Research Cambridge, 2011.
[52]: R. W. Picard. Toward Machines with Emotional Intelligence. In: IEEE Transactions on Pattern Analysis and Machine Intelligence -‐ Graph Algorithms and Computer Vision Journal, Vol. 23, 10, IEEE Computer Society, 2001, pp. 1175-‐1191. [53]: O.A. Schipor, Ş.G. Pentiuc, M.D. Schipor. Towards a multimodal emotion recognition framework to be integrated in a computer based speech therapy system. In: The 6th International Conference on Speech Technology and Human-‐Computer Dialogue, 2011. [54]:R.W. Picard.Future affective technology for autism and emotion communication Phil. Trans. R. Soc. B December 12,2009. [55]: IRIS project. Integrate Research on Interactive Storytelling. http://iris.scm.tees.ac.uk/ [56]: Lennart E. Nacke. Directions in Physiological Game Evaluation and Interaction. In CHI 2011 BBI Workshop Proceedings, Vancouver, BC, Canada. 2011
[57]: Ekman, P, & Friesen, W. V. (1978). The facial action coding system: A technique for the measurement of facial movement. Palo Alto: Consulting Psychologists Press.
[58]: G. Castellano, L. Kessous, G. Caridakis. Emotion Recognition through Multiple Modalities: Face, Body Gesture, Speech. In : Affect and Emotion in Human-Computer Interaction, Springer Berlin / Heidelberg, 2008. Pp 92-103
[59]: A. Batliner, D. Seppi, S. Steidl, B. Schuller. Segmenting into Adequate Units for Automatic Recognition of Emotion-‐Related Episodes: A Speech-‐Based Approach. In :Advances in Human-‐Computer Interaction Volume 2010 (2010)
[60]: Anton Batliner, Stefan Steidl, Dino Seppi, and Bjorn Schuller. 2010. Segmenting into adequate units for automatic recognition of emotion-‐related episodes: a speech-‐based approach. Adv. in Hum.-Comp. Int. 2010, Article 3 (January 2010), 15 pages.
[61]: T. Vogt, E. André and N. Bee, "EmoVoice -‐ A framework for online recognition of emotions from voice," in Proceedings of Workshop on Perception and Interactive Technologies for Speech-‐Based Systems, 2008.
[62]: F. Eyben, M. Wöllmer, and B. Schuller. openEAR -‐ Introducing the Munich Open-‐Source Emotion and Affect Recognition Toolkit. In:Proc. 4th International HUMAINE Association Conference on Affective Computing and Intelligent Interaction 2009 (ACII 2009), Amsterdam, The Netherlands, volume I, pp. 576–581. IEEE, 2009. 10.-‐12.09.2009.
79
[63]: A. Mehrabian. Communication without words. Psychology Today, 2(4):53–56, 1968.
[64]: Christian Kublbeck and Andreas Ernst. 2006. Face detection and tracking in video sequences using the modified census transformation. Image Vision Comput. 24, 6 (June 2006), 564-‐572.
[65]: Salah, A.A., N. Sebe, Th. Gevers, Communication and automatic interpretation of affect from facial expressions, in D. Gökçay & G. Yıldırım (eds.), Affective Computing and Interaction: Psychological, Cognitive and Neuroscientific Perspectives, to appear.
[66]: Rana el Kaliouby and Peter Robinson. Real-‐Time Inference of Complex Mental States from Facial Expressions and Head Gestures. In the IEEE International Workshop on Real Time Computer Vision for Human Computer Interaction at el Kaliouby, CVPR, 2004.
[67]: Intelligent Behaviour Understanding Group (iBUG), Department of Computing, Imperial College London http://ibug.doc.ic.ac.uk/resources/facial-‐tracker-‐2011/
[68]: Seeing Macinnes. FaceAPI http://www.seeingmachines.com/product/faceapi/
[69]: open Computer Vision Library. http://opencv.org
[70]: Coulson, M. (2004) 'Attributing Emotion To Static Body Postures: Recognition Accuracy, Confusions, And Viewpoint Dependence.' Journal of Nonverbal Behavior 28 (2) 117-‐139
[71]: Kleinsmith A., and Bianchi-‐Berthouze N., Recognizing affective dimensions from body posture, In: Proc. 2nd Intl Conf of ACII, LNCS 4738, Portugal, pp. 48-‐58, 2007
[72]: A. Metallinou , A. Katsamanis, Wang Yun, S.Narayanan. Tracking changes in continuous emotion states using body language and prosodic cues. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE. Prague. 2011. pp 2288-‐2291
[73]: Riskind, J.H., and Gotay, C.C.: Physical posture: Could it have regulatory or feedback effects on motivation and emotion? Motivation and Emotion 6(3) (1982).pp 273–298
[74]: N. Bianchi-‐Berthouze, P., Cairns, A., Cox, C., Jennett, W.., Kim,.On posture as a modality for expressing and recognizing emotions. Emotion and HCI workshop at BCS HCI London, September, 2006
[75]: A. Camurri, B. Mazzarino, G. Volpe. Analysis of Expressive Gesture: The EyesWeb Expressive
Gesture Processing Library In: GESTURE BASED COMMUNICATION IN HUMAN-‐COMPUTER INTERACTION Lecture Notes in Computer Science, 2004, Volume 2915/2004, 469-‐470
[76]: Timo Partala and Veikko Surakka. 2003. Pupil size variation as an indication of affective processing. Int. J. Hum.-‐Comput. Stud. 59, 1-‐2 (July 2003), 185-‐198.
[77]: Eija Haapalainen, SeungJun Kim, Jodi F. Forlizzi, and Anind K. Dey. 2010. Psycho-‐physiological measures for assessing cognitive load. In Proceedings of the 12th ACM international conference on Ubiquitous computing (Ubicomp '10). ACM, New York, NY, USA, 301-‐310.
[78]: OpenEEG. http://openeeg.sourceforge.net/doc/
[79]: Erin Treacy Solovey, Audrey Girouard, Krysta Chauncey, Leanne M. Hirshfield, Angelo Sassaroli, Feng Zheng, Sergio Fantini, and Robert J.K. Jacob. 2009. Using fNIRS brain sensing in realistic HCI settings: experiments and guidelines. In Proceedings of the 22nd annual ACM symposium on User interface software and technology (UIST '09). ACM, New York, NY, USA, 157-‐166.
[80]: O. A. Schipor, S. G. Pentiuc, M. D. Schipor. Towards a multimodal emotion recognition framework to be integrated in a Computer Based Speech Therapy System. In: 6th Conference on Speech Technology and Human-‐Computer Dialogue (SpeD), IEEE.Brasov.Romania.2011. pp 1-‐6.
[81]: Bertoncini, M. and Cavazza, M., 2007. Emotional Multimodal Interfaces for Digital Media: The CALLAS Challenge. Proceedings of HCI International 2007.
[82]: Marc Schröder. The SEMAINE API: Towards a Standards-‐Based Framework for Building Emotion-‐Oriented Systems. In: Advances in Human-‐Computer Interaction. Volume 2010 (2010), Article ID 319406, 21 pages
[83]: J. Wagner, F. Lingenfelser, and E. Andre, The Social Signal Interpretation Framework (SSI) for Real Time Signal Processing and Recognitions," in Proceedings of INTERSPEECH 2011, Florence, Italy, 2011.
[84]: F. Lavagetto and R. Pockaj, "The Facial Animation Engine: towards a high-‐level interface for the design of MPEG-‐4 compliant animated faces", IEEE Trans. on Circuits and Systems for Video Technology, Vol. 9, n.2, March 1999, pp.277-‐289
[85]: MPEG-‐V (Information Exchange with Virtual Worlds) http://mpeg.chiariglione.org/working_documents.htm#MPEG-‐V , http://www.metaverse1.org/
80
[86]: EMMA: Extensible MultiModal Annotation markup language W3C Recommendation 10 February 2009 http://www.w3.org/TR/emma/ [87]: Emotion Markup Language (EmotionML) 1.0. W3C Working Draft 7 April 2011 http://www.w3.org/TR/emotionml/
[88]: Marco Grassi. 2009. Developing HEO human emotions ontology. In Proceedings of the 2009 joint COST 2101 and 2102 international conference on Biometric ID management and multimodal communication (BioID_MultiComm '09), Julian Fierrez, Javier Ortega-‐Garcia, Anna Esposito, Andrzej Drygajlo, and Marcos Faundez-‐Zanuy (Eds.). Springer-‐Verlag, Berlin, Heidelberg, 244-‐251.
[89]: S. Kopp, B. Krenn, S. Marsella, et al., “Towards a common framework for multimodal generation: the behavior markup language,” in Proceedings of the 6th International Conference on Intelligent Virtual Agents (IVA ’06), vol. 4133 of Lecture Notes in Computer Science, pp. 205–217, 2006.
[90]: HUMAINE. http://emotion-‐research.net/
[91]: Katie Crowley, Aidan Sliney, Ian Pitt, and Dave Murphy. 2010. Evaluating a Brain-‐Computer Interface to Categorise Human Emotional Response. In Proceedings of the 2010 10th IEEE International Conference on Advanced Learning Technologies (ICALT '10). IEEE Computer Society, Washington, DC, USA, 276-‐278.
[92]: Genaro Rebolledo-‐Mendez, Ian Dunwell, Erika A. Martinez-‐Miron, Maria Dolores Vargas-‐Cerdan, Sara Freitas, Fotis Liarokapis, and Alma R. Garcia-‐Gaona. 2009. Assessing NeuroSky's Usability to Detect Attention Levels in an Assessment Exercise. In Proceedings of the 13th International Conference on Human-‐Computer Interaction. Part I: New Trends, Julie A. Jacko (Ed.). Springer-‐Verlag, Berlin, Heidelberg, 149-‐158.
[93]: Self City. Waag Society, Stichting Experimentele Werkplaatsen (SEW), RENN4: Regionaal Expertise Centrum Noord Nederland (cluster 4), Prof.dr.H.J.M.Hermans,Em, Radboud University,Nijmegen. http://waag.org/projects/selfcity
[94]: C. S. Pinhanez, Advisor(s) Aaron F. Bobick. (1999). Representation and Recognition of Action in Interactive Spaces. Ph.D. Dissertation. Massachusetts Institute of Technology. CA, USA.
[95]: Sung, J., Ponce, C., Selman, B., and Saxena, A. (2011). Human Activity Detection from RGBD Images. Artificial Intelligence abs/1107.0, 47-‐55. Available at: http://arxiv.org/abs/1107.0169.
[96]: M. Mateas. (2002). Interactive Drama, Art and Artificial Intelligence. Ph.D. Dissertation. Carnegie Mellon Univ., Pittsburgh, PA, USA.
[97]: R. M. Taylor, II, T. C. Hudson, A. Seeger, H. Weber, J. Juliano, and A. T. Helser. (2001). VRPN: a device-‐independent, network-‐transparent VR peripheral system. InProceedings of the ACM symposium on Virtual reality software and technology (VRST '01). ACM, New York, NY, USA, 55-‐61.
81
top related