transsituational individual-specific biopsychological classification of emotions

8
988 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 4, JULY 2013 Correspondence Transsituational Individual-Specific Biopsychological Classification of Emotions Steffen Walter, Jonghwa Kim, Senior Member, IEEE, David Hrabal, Stephen Clive Crawcour, Henrik Kessler, and Harald C. Traue Abstract—The goal of automatic biopsychological emotion recognition of companion technologies is to ensure reliable and valid classification rates. In this paper, emotional states were induced via a Wizard-of-Oz mental trainer scenario, which is based on the valence–arousal–dominance model. In most experiments, classification algorithms are tested via leave-out cross-validation of one situation. These studies often show very high classification rates, which are comparable with those in our experi- ment (92.6%). However, in order to guarantee robust emotion recognition based on biopsychological data, measurements have to be taken across several situations with the goal of selecting stable features for individual emotional states. For this purpose, our mental trainer experiment was conducted twice for each subject with a 10-min break between the two rounds. It is shown that there are robust psychobiological features that can be used for classification (70.1%) in both rounds. However, these are not the same as those that were found via feature selection performed only on the first round (classification: 53.0%). Index Terms—Biopsychological analysis, biosignals, emotion recogni- tion, feature selection, transsituational emotion. I. I NTRODUCTION In the future, technical cognitive systems (TCS) will appear in everyday life in many different ways. They will provide people with helpful information, support them during decision-making processes, and communicate their intentions to the social and technical environ- ment. For these techniques, attributes of individuality, adaptability, availability, cooperativeness, and trustworthiness must be developed. The authors of this paper refer to those technologies as companion technologies. 1 Companion attributes in TCS are supposed to ensure that they are perceived, accepted, and utilized by their users as personal and empathic assistants. This is only possible if the functionality of such companion technologies is consistently and fully automat- ically geared toward the individual user, by orienting toward his abilities, preferences, requirements, and current needs and adapting to his situation and emotional state, i.e., if it possesses the ability Manuscript received October 4, 2011; revised May 25, 2012; accepted July 18, 2012. Date of publication February 20, 2013; date of current ver- sion June 12, 2013. This work was supported in part by grants from the Transregional Collaborative Research Centre 62 (SFB/TRR 62) “Companion- Technology for Cognitive Technical Systems” funded by the German Re- search Foundation (DFG). This paper was recommended by Associate Editor M. Kamel. S. Walter, D. Hrabal, and H. C. Traue are with the Department of Psycho- somatic Medicine and Psychotherapy, Medical Psychology, Ulm University, 89075 Ulm, Germany (e-mail: [email protected]). J. Kim is with the Department of Applied Computer Science, University of Augsburg, 86159 Augsburg, Germany (e-mail: [email protected]). S. C. Crawcour is with the Department of Clinical Psychology, Dresden University of Technology, 1062 Dresden, Germany (e-mail: crawcour@ psychologie.tu-dresden.de). H. Kessler is with the Department of Psychiatry, Medical Psychology, Bonn University, 53115 Bonn, Germany (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCA.2012.2216869 1 www.mindmedia.nl/english/nexus32.php to empathize. With the growing application of increasingly complex “intelligent” technical systems in all areas of life, the demands on the individual user during their operation also increase. Simultaneously, the technological development opens up new unforeseen opportunities of technical support and digital assistance. Companion technology can make an important contribution in this area, particularly with regard to the future of our aging society. Its application potential ranges from novel individual operation assistants for technical equipment to a new generation of versatile organization assistants and digital services and, finally, to innovative support systems, e.g., for patients in rehabilitation or people with limited cognitive abilities. An essential precondition for this is the robust and reliable classification of emo- tional states (e.g., anger). Because emotional behavior is a complex multidimensional process, taking place in time, several measuring planes contain partially redundant, convergent, or conflicting emotion information (prosody, facial expressions, gestures, and psychobiolog- ical). Compared with other sources of variance of human emotion, biopsychological data can be continuously measured and can contain emotion information, even when expressive behavior in the form of language, facial expressions, and gestures is not exhibited. A. Emotion Recognition and Theory Every automatic classification of emotions implies that the objects or phenomena can be distinguished on the basis of observable or mea- surable characteristics. The affiliation with a particular class results from an “object” exhibiting characteristics previously established to be necessary for inclusion within the class. Characteristics of an emotion are psychobiological reactions, emotional experiences, expressions, behavioral tendencies, and cognitive evaluations [1]. None of the existing classification approaches uses all available data sources simul- taneously. Traditionally, one distinguishes between theoretical versus empirical research approaches or between discrete versus dimensional approaches [2]. The classification of facial movements and linguistic parameters is predominantly based on discrete emotion models, such as the concept of basic emotions (anger, disgust, sadness, fear, joy, and surprise) by Ekman and Friesen [3]. Psychobiological classification algorithms [4]–[7] are often based on the dimensional theory of valence, arousal, and dominance [8]. This reductionist model, which was historically developed from the works of Wundt [9], has been reformulated in various ways [10]–[12]. While they are undisputed for the dimensions of valence and arousal, the operationalizations of the third dimension, usually dominance, deviate from each other. Some authors question the independence of this third dimension from valence and arousal, as it frequently correlates with these dimensions. Nevertheless, concrete and discrete emotion stimuli cannot evenly fill the 3-D space. Numer- ous studies have reliably and robustly demonstrated a psychobiological correlation between the variables of valence and arousal [13]–[15]. Correlations between the dimensions of valence and a corrugator or zygomatic muscle [electromyography (EMG)] and heart rate (HR), as well as between arousal and skin conductance (SC), HR, and slow- wave electroencephalography (EEG), are particularly reliable [13]. This has been also described for the interaction between subjects and virtual agents [16]. The stimuli-dependent psychobiological re- action can objectify the valence of stimuli and operationalize arousal dimensions [17], [18]. Cacioppo [19] illustrated that the asymmetry 2168-2216/$31.00 © 2013 IEEE

Upload: harald-c

Post on 16-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Transsituational Individual-Specific Biopsychological Classification of Emotions

988 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 4, JULY 2013

CorrespondenceTranssituational Individual-Specific Biopsychological

Classification of Emotions

Steffen Walter, Jonghwa Kim, Senior Member, IEEE, David Hrabal,Stephen Clive Crawcour, Henrik Kessler, and Harald C. Traue

Abstract—The goal of automatic biopsychological emotion recognitionof companion technologies is to ensure reliable and valid classificationrates. In this paper, emotional states were induced via a Wizard-of-Ozmental trainer scenario, which is based on the valence–arousal–dominancemodel. In most experiments, classification algorithms are tested vialeave-out cross-validation of one situation. These studies often show veryhigh classification rates, which are comparable with those in our experi-ment (92.6%). However, in order to guarantee robust emotion recognitionbased on biopsychological data, measurements have to be taken acrossseveral situations with the goal of selecting stable features for individualemotional states. For this purpose, our mental trainer experiment wasconducted twice for each subject with a 10-min break between the tworounds. It is shown that there are robust psychobiological features that canbe used for classification (70.1%) in both rounds. However, these are notthe same as those that were found via feature selection performed only onthe first round (classification: 53.0%).

Index Terms—Biopsychological analysis, biosignals, emotion recogni-tion, feature selection, transsituational emotion.

I. INTRODUCTION

In the future, technical cognitive systems (TCS) will appear ineveryday life in many different ways. They will provide people withhelpful information, support them during decision-making processes,and communicate their intentions to the social and technical environ-ment. For these techniques, attributes of individuality, adaptability,availability, cooperativeness, and trustworthiness must be developed.The authors of this paper refer to those technologies as companiontechnologies.1 Companion attributes in TCS are supposed to ensurethat they are perceived, accepted, and utilized by their users as personaland empathic assistants. This is only possible if the functionalityof such companion technologies is consistently and fully automat-ically geared toward the individual user, by orienting toward hisabilities, preferences, requirements, and current needs and adaptingto his situation and emotional state, i.e., if it possesses the ability

Manuscript received October 4, 2011; revised May 25, 2012; acceptedJuly 18, 2012. Date of publication February 20, 2013; date of current ver-sion June 12, 2013. This work was supported in part by grants from theTransregional Collaborative Research Centre 62 (SFB/TRR 62) “Companion-Technology for Cognitive Technical Systems” funded by the German Re-search Foundation (DFG). This paper was recommended by Associate EditorM. Kamel.

S. Walter, D. Hrabal, and H. C. Traue are with the Department of Psycho-somatic Medicine and Psychotherapy, Medical Psychology, Ulm University,89075 Ulm, Germany (e-mail: [email protected]).

J. Kim is with the Department of Applied Computer Science, University ofAugsburg, 86159 Augsburg, Germany (e-mail: [email protected]).

S. C. Crawcour is with the Department of Clinical Psychology, DresdenUniversity of Technology, 1062 Dresden, Germany (e-mail: [email protected]).

H. Kessler is with the Department of Psychiatry, Medical Psychology, BonnUniversity, 53115 Bonn, Germany (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSMCA.2012.2216869

1www.mindmedia.nl/english/nexus32.php

to empathize. With the growing application of increasingly complex“intelligent” technical systems in all areas of life, the demands on theindividual user during their operation also increase. Simultaneously,the technological development opens up new unforeseen opportunitiesof technical support and digital assistance. Companion technology canmake an important contribution in this area, particularly with regardto the future of our aging society. Its application potential rangesfrom novel individual operation assistants for technical equipmentto a new generation of versatile organization assistants and digitalservices and, finally, to innovative support systems, e.g., for patientsin rehabilitation or people with limited cognitive abilities. An essentialprecondition for this is the robust and reliable classification of emo-tional states (e.g., anger). Because emotional behavior is a complexmultidimensional process, taking place in time, several measuringplanes contain partially redundant, convergent, or conflicting emotioninformation (prosody, facial expressions, gestures, and psychobiolog-ical). Compared with other sources of variance of human emotion,biopsychological data can be continuously measured and can containemotion information, even when expressive behavior in the form oflanguage, facial expressions, and gestures is not exhibited.

A. Emotion Recognition and Theory

Every automatic classification of emotions implies that the objectsor phenomena can be distinguished on the basis of observable or mea-surable characteristics. The affiliation with a particular class resultsfrom an “object” exhibiting characteristics previously established to benecessary for inclusion within the class. Characteristics of an emotionare psychobiological reactions, emotional experiences, expressions,behavioral tendencies, and cognitive evaluations [1]. None of theexisting classification approaches uses all available data sources simul-taneously. Traditionally, one distinguishes between theoretical versusempirical research approaches or between discrete versus dimensionalapproaches [2].

The classification of facial movements and linguistic parameters ispredominantly based on discrete emotion models, such as the conceptof basic emotions (anger, disgust, sadness, fear, joy, and surprise)by Ekman and Friesen [3]. Psychobiological classification algorithms[4]–[7] are often based on the dimensional theory of valence, arousal,and dominance [8]. This reductionist model, which was historicallydeveloped from the works of Wundt [9], has been reformulated invarious ways [10]–[12]. While they are undisputed for the dimensionsof valence and arousal, the operationalizations of the third dimension,usually dominance, deviate from each other. Some authors questionthe independence of this third dimension from valence and arousal, asit frequently correlates with these dimensions. Nevertheless, concreteand discrete emotion stimuli cannot evenly fill the 3-D space. Numer-ous studies have reliably and robustly demonstrated a psychobiologicalcorrelation between the variables of valence and arousal [13]–[15].Correlations between the dimensions of valence and a corrugator orzygomatic muscle [electromyography (EMG)] and heart rate (HR), aswell as between arousal and skin conductance (SC), HR, and slow-wave electroencephalography (EEG), are particularly reliable [13].This has been also described for the interaction between subjectsand virtual agents [16]. The stimuli-dependent psychobiological re-action can objectify the valence of stimuli and operationalize arousaldimensions [17], [18]. Cacioppo [19] illustrated that the asymmetry

2168-2216/$31.00 © 2013 IEEE

Page 2: Transsituational Individual-Specific Biopsychological Classification of Emotions

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 4, JULY 2013 989

in frontal cortex activation is a robust measure of intentionality interms of approach, withdrawal, and valence [20]. Differential corre-lations for the valence and intensity of the Internal Affective PictureSystems (IAPS) stimuli and for the left-versus-right hemisphere werecalculated by means of EEG [21]. However, the results of thesestudies were based on mean values associated to varying standarddeviations. Standard deviations can be understood as an indicator ofindividual specificity. In the field of automatic emotion recognition,such specificity is highly relevant for accurate and robust results.

B. Automatic Biopsychological Emotion Recognition

Automatic emotion-recognition algorithms have been applied toa variety of biopsychological measures. Emotion induction methodshave been also utilized in various ways, e.g., through psychotropicdrugs; re-enactment of facial expressions; revitalization of experiencedemotional situations; and presentation of films, music, fairy tales, orjokes [22]–[29].

Because visual stimuli can be easily classified by content and arewell controlled according to size and duration, the standardized stim-ulus material of the IAPS [30] is often used. The classification withbiopsychological parameters enables correct assignment (> 80%) withregard to the valence/arousal quadrants, namely, positive valence/lowarousal, negative valence/low arousal, positive valence/high arousal,and negative valence/high arousal [5]–[7].

Particularly for the area of the application of human–computerinteraction (HCI), such approaches suffer from the succeeding issues.

• Few studies have a natural content [4], [5].• Most of studies on automatic recognition of emotion under

everyday context have been performed with a small size ofobservations and participants [31].

• The vision of many automatic emotion-recognition researchersis to find robust inter-individual (subject-independent) classifi-cation rates, in spite of indications from pure research that thebiopsychological correlate of emotion is highly specific to theindividual [32]–[35], “Individual differences prevail in physio-logical recordings just as they do in the same situation” [36].Furthermore, in the area of automatic emotion recognition, thereexist many hints that individual classification rates are more ac-curate rates when compared with inter-individual rates [4], [37],[38]. However, a statistical dimension of the individual advantageremains important, particularly in a naturalistic context.

• To the best of our knowledge, few studies exist (e.g., [39] and[40]) with regard to automatic emotion recognition according tothe biopsychological aspect with a focus on different situations(i.e., a transsituational focus), namely training a classifier ofsituation t1, and test the emotion recognition of situations t2,t3, t4, t5, etc. According to Stemmler and Wacker [36], it isvery likely that absolute and differential stability is high for theemotional state but low for the person’s physiological processes.A possible data fusion from different features could, thus, be alsoaffected by both soft (i.e., individual specificity) and hard (i.e.,transsituational) problems.

C. Goal of the Study

In order to compensate for such shortcomings, we developed anexperimental design with the research questions presented here.

• Soft problem: To what extent are individual (i.e., subject-dependent) classifiers of emotion recognition in HCI significantlysuperior to inter-individual (i.e. subject-independent) classifiers,and to what extent can high effect sizes [41] be demonstrated?

Fig. 1. Valence–arousal–dominance (VAD) space with eight octants.

• Hard problem: How robust is the individual specificity classi-fication transsituationally? How can individual specific featureselection improve classification rates transituationally?

In order to induce target emotions during the experiment, we con-sidered the following affective factors that are implemented as naturallanguage dialog:

• different user matrices;• delay of the command;• non-execution of the command;• incorrect speech recognition;• offer of assistance;• lack of technical assistance;• request for termination;• positive feedback.

These technical conditions allowed the inducement of differentemotions in the user in a top-down manner. The procedure of emo-tion induction included different experimental sequences (ES), duringwhich the user passed through specific valence (positive [P], negative[N], neutral [Neu]), arousal (low [L], high [H], neutral [Neu]), anddominance (low control [L], high control [H], neutral [Neu]) octantsof the valence–arousal–dominance (VAD) model (see Fig. 1) in acontrolled fashion. It was a key priority to induce the emotion of“negative valence/high arousal/low dominance (NHL)” in the subjectstarting from the state of “positive valence/low arousal/high domi-nance (PLH).” In a pilot study, we tested ten participants in order toverify the elicitation of the octants.

In this paper, we performed a combined analysis by using four-channel peripheral psychobiological measurements, including bloodvolume pulse (BVP), skin conductance level (SCL), and two-channelEMG for classifying two emotional states, i.e., PLH and NHL. Bothemotional states are corresponding to the reciprocal octants with“anchor points” (i.e., extreme points) for valence, arousal, and dom-inance (see Fig. 1). Intermediate induction sequences are designedto ensure a successively emotional validate transition between bothemotional states. In the next section, we give a detailed description ofthe experimental design, which is followed by the feature extractionand the classification method. We then compare the performance ofindividual and inter-individual classifications by considering five dif-ferent observation cases and conclude with discussion and perspectivesrelated to future work.

Page 3: Transsituational Individual-Specific Biopsychological Classification of Emotions

990 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 4, JULY 2013

Fig. 2. User in the WOZ experiment. The design is a mental trainer.

II. METHODOLOGY

A. Participants

A total of 20 subjects participated in the experiment. Out of these,10 were women (5 of them were younger [mean age = 27.6] and 5were older [mean age = 46.4]; split half: 40 years) and 10 men (5 ofthem were younger [mean age = 25.2] and 5 were older [mean age =56.2]; split half: 40 years). The subjects received an expense al-lowance. The study was conducted according to the ethical guidelinesof Helsinki (there was an ethics committee: C4 245/08-UBB/se).

B. Experimental Design

The simulation of the natural verbal HCI was implemented as aWizard-of-Oz (WOZ) experiment (named EmoRec-Woz I) [42], [43](see Fig. 2). The WOZ experiment allows the simulation of computeror system properties in a manner such that subjects have the impressionthat they are having a completely natural verbal interaction witha computer-based mental trainer. The design of the mental trainerfollowed the principle of the popular game “Concentration.” Thevariation of the system behavior in response to the subjects wasimplemented via natural spoken language, with parts of the subject’sreactions automatically taken into account. However, the system doesnot work with algorithmic speech recognition and automatic responsecontrol but is controlled by an experimenter (wizard) in an adjoiningroom. This method allows simulating HCIs that technically are notyet possible (or only rudimentarily so), such as reliable recognition ofnatural spoken language. In the succeeding discussions, we describedetails of our experiment.

Instruction From the Experimenter at the Beginning: “You willcommunicate with a computer and complete a mental test. The experi-ment will be carried out in this room. You will receive instructions fromthe system. The system can react to you on the basis of language, ges-tures, and facial expressions. Therefore, there are a microphone and acamera. The system has many different functions. At the beginning ofthe test, you will be asked to answer some questions. When you exceedcertain thresholds with regard to the biopsychological parameters,the system will ask you whether to terminate the task. Poor languagerecognition may lead to technical delays. Try to treat the system likeyou would a human being.”

Structure of the ES: The experiment involved a two-part mentaltraining (see Fig. 3). Both parts consisted of identical ES. Part 2 also

included a debriefing sequence at the end of the scenario. Betweenrounds one and two, 24 images with a mix of valence (positive versusnegative), arousal (low versus high), and dominance (low versus high)(IAPS [30]) were shown to the subjects so that they could distancethemselves cognitively and emotionally from the first round. In a pilotstudy, we also tested different emotional and cognitive distance effects.Following the completion of the different distance procedures, we pro-ceeded with a rating using the Self-Assessment Manikin (SAM [44])scale and conducted an interview. We found that the emotion elicitedwas, in tendency, more neutral in valence, arousal, and dominance; andcognitive gave the participants a strong focus on the IAPS presentationand not on the WOZ experiment. In this regard, we chose the mixedIAPS option in valence, arousal, and dominance. The presentation timefor each image was 6 s, with a break of 4–10 s.

ES-1:

1) System instructions: “On the screen in front of you is a deck ofcovered cards with different images. Every image exists twice.Your task is to successively uncover the image pairs. The idea isthat you uncover them as quickly as possible as there will be atime bonus. At the same time, you should make as few mistakesas possible because every wrong pair will be deducted from yourscore. On the upper screen, you see a bar that reflects yourperformance relative to your age group. Communication withme is exclusively verbal. I understand many different commands.So, just speak the way you usually would. We will now beginthe mental test. The test is divided into several rounds. I wishyou success in solving the tasks! You begin the first test with thecommand: start test.”

2) Procedure: For ES-1, a deck of a 4 × 4 matrix (see Fig. 4)with highly discriminative images was presented. After threecorrectly uncovered pairs, the subject received the feedback“Your performance is improving” with visual bar feedback of“good.” After half of the ES (four remaining pairs), a count-down (see Fig. 3) of 5 min was set, with the message “Pleasefinish the game quickly! You have five minutes!” At the end ofthe ES, the subject received the following comment: “You havesuccessfully solved the first task. Please describe how you feel atthe moment.”

3) Emotional target state: In the first half (four pairs), ES-1 wassupposed to induce the octant PLH and, after the countdown, theoctant PHH.

ES-2:

1) System instructions: “Now comes the second round. Try to be asgood as before. Begin again with the command: start test.”

2) Procedure: A deck of a 4 × 4 matrix with highly discriminativeimages was presented. After three correctly uncovered pairs, thesubject received the feedback “Your performance is improving”with visual bar feedback with the message “very good.” Nocountdown was presented. At certain intervals, the wizard sentpositive feedback, such as the following:

• “Very good!”• “Keep it up!”• “You are doing great!”• “Great!”• “Your memory works perfectly!”At the end of the ES, the subject received the following

comment: “You have also very successfully solved this round!Please describe your emotional state.” If necessary: “Could youdescribe this in more detail?”

3) Emotional target state: ES-2 was supposed to induce the octantPLH.

Page 4: Transsituational Individual-Specific Biopsychological Classification of Emotions

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 4, JULY 2013 991

Fig. 3. Experimental design, including the expected position in the VAD space. After a 5-min initialization with IAPS images, the experiment started with a shortintroduction, which is followed by the first round of the mental training (ES-1–ES-5). The second round of the mental training contained six games (ES-1–ES-6)and starts after a 10-min presentation of IAPS images. The experiment ended with the completion of standardized questionnaires. Our classification categorieswere ES-2 (orange) versus ES-5 (turquoise). In addition, the training sets (sets 1–4) for the classification were visualized.

Fig. 4. User surface of the WOZ experiment.

ES-3:

1) System instructions: “The third deck is larger. The majority ofsubjects in your age group solve the task without problems.For this deck you can, however, ask for help. After you haveuncovered a card, the second card will be uncovered. Beginagain with the command: start test.”

2) Procedure: A deck of a 4 × 5 matrix with highly discrimina-tive images was presented. After three incorrectly uncoveredpairs, the subject received the feedback “Your performance isdeclining” and visual feedback with the message of “mediocre.”After half of the ES (five remaining pairs), a countdown of2 min was begun. “Please finish the game quickly now! Youhave two minutes!” At the end of the ES, the subject received thefollowing comment: “Task solved. This round was more difficultbecause of the number of cards. Nevertheless, you have againsuccessfully solved the task. How do you feel now?”

3) Emotional target state: In the first half (five pairs correctlyuncovered), ES-3 was supposed to induce the octant PLH and,after the countdown, the octant NeuNeuL.

ES-4:

1) System instructions: “In the fourth round the difficulty will beincreased again. You can again ask for help with this deck. Beginagain with the command: start test.”

2) Procedure: A deck of a 4 × 5 matrix with two discriminativeimages (ships and airplanes) was presented. After three incor-rectly uncovered pairs, the subject received the feedback “Yourperformance is declining” and visual bar feedback “below aver-age.” There ca. six delays (6 s) in uncovering the cards or cardswere incorrectly uncovered. After half of the ES (five pairs), acountdown of 1 min was set. “Please finish the game quicklynow! You have one minute!” At the end of the ES, the subjectreceived the following comment: “The round was terminated.This deck was very stressful for you. How do you feel now?”

3) Emotional target state: In the first half, ES-4 was supposed toinduce the octant NLL and, after the countdown, the octantNHL.

ES-5:

1) System instructions: “Try to not ask for any help in round 5.Begin again with the command: start test.”

2) Procedure: A deck of a 5 × 6 matrix with very similar images(snowdrops) was presented. After three incorrectly uncoveredpairs, the subject received the feedback “Your performance isdeclining” and visual bar feedback with the message “verypoor.” There were either ca. ten delays in uncovering the cardsor cards were incorrectly uncovered. After the first third ofthe ES, the subject was given the comment: “Your articulationis unclear!” After processing half of the matrix, the subjectreceived the following comment four times in a row: “Wouldyou like to terminate the task?” At the end of the ES, the subjectreceived the comment: “The task was very difficult for you,therefore I have terminated the task. Don’t be disappointed. Howdo you feel now?”

3) Emotional target state: In the first half, ES-5 was supposed toinduce the octant NHL; and after the question with regard totermination, the dimension NHL was to be strengthened further,i.e., valence becomes more negative, arousal increases, anddominance decreases.

Page 5: Transsituational Individual-Specific Biopsychological Classification of Emotions

992 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 4, JULY 2013

Fig. 5. Fusion and classification of the parameters, namely, EMG, SCL, andBVP.

ES-6:

1) System instructions: “Try to focus one last time, even if it is verydifficult. Begin again with the command: start test.”

2) Procedure: The deck consisted of a 5 × 6 matrix with highly dis-criminative pairs. The subject successively received the feedback“Your performance is improving” until the visual bar feedback“very good” appeared. As in ES-2, the subjects received pos-itive performance feedback. At the end of the ES, the subjectsreceived the question: “And finally, how do you feel?”

3) Emotional target state: ES-6 was supposed to induce the octantPLH.

4) Completion of the system questions scenario: “How do you rateyour performance? How did you like the interaction with thesystem? What did you not like?”

C. Biosensors

A Nexus-322 amplifier was used for recording biopsychologicaldata during the experiment. Biosignals and event data were recordedvia the Bioserve3 software. The succeeding parameters were includedin the classification (see Fig. 5).

SCL: To measure the SCL, two electrodes of the sensor werepositioned on the index and ring fingers. Since the sweat glands areexclusively innervated sympathetically, i.e., without influence of theparasympathetic nervous system, the electrodermal activity is consid-ered a good indicator of the “inner” tension of a person. This aspect canbe reproduced particularly impressively by the observation of a rapidincrease in SCL within 1–3 s due to a simple stress stimulus (e.g., deepbreathing, emotional excitement, or mental activity).

BVP: The BVP is controlled by the amount of blood ejected perbeat and the peripheral resistance. It is, therefore, also a measure forperfusion. Photoplethysmography was used for measuring BVP. In thismethod, a light source projects infrared light onto a tissue (e.g., a fingeror a toe) and the amount of light reflected by photosensors changes.Vasomotor activity is controlled by the autonomic nervous system. Thestate of arousal of a person can be also determined from the frequencyof the BVP signal curve. The sensor was positioned on the middlefinger.

EMG: Electrical muscle activity is also an indicator of generalpsychophysiological arousal, as increased muscle tone is associatedwith increasing activity of the sympathetic nervous system, whereasa decrease in somatomotor activity is associated with predominantlyparasympathetic arousal. We used two-channel EMGs for corrugator

2www.mindmedia.nl/english/nexus32.php3www.bioserve.com

and zygomaticus muscles. EMG responses over facial muscle regionssuch as corrugator supercilii, which draws the brow downward andmedialward to form a frown, and zygomaticus major, which elevatesthe corner of the mouth superiorly and posteriorly to produce a smile,can effectively discriminate the valance and intensity of emotionalstates.

D. Emotion Rating

Following the mental test, every participant completed a ratingon the valence, arousal, and dominance dimensions. For this rating,subjects were required to rank the different ES conditions. In the pilotphase, it became clear that the subjects were only partially able todistinguish their subjective emotional states in rounds one and two,in retrospect. Hence, the subjects were questioned with regard to eachES independent of rounds one and two. The SAM [44] was used for therating. The rating scale read as follows: for valence, “1” as absolutelynegative, “5” as neutral, and “9” as absolutely positive; for arousal,“1” as absolutely relaxed, “5” as medium arousal, and “9” as highlyaroused; and for dominance, “1” as absolute loss of emotional control,“5” as medium emotional control, and “9” as absolute control.

E. Feature Extraction

As described in Section II, we used four-channel biosignals (2×EMG, BVP, and SCL) for emotion recognition. The signal durationis about 3 min for ES2 and 5 min for ES5, representing antitheticemotions, PLH and NHL, respectively. Using a window length of10 s, with an overlap of 5 s, we segmented the signals into about20 samples of ES-2 and 20–30 samples of ES-5, depending on differentexperiment durations of each subject. For the classification of the twoemotional states, we extracted a total of 13 statistic features basedon mean, minimum, and maximum values from each segment. Wenote that, since the goal of our study was to analyze the “general”tendency of physiological specificity between different situations andindividuals, we used such a compact feature set, including only basicstatistics rather than using an extended one.

For the EMG signals obtained from corrugator supercilii (xc[n])and zygomaticus major (xz[n]), which are by nature noisy, we em-ployed 20 points symmetrical moving average filter for smootheningthem. From the filtered signal x̂c[n], we calculated three statistics(mean (f1), max (f2), and min (f3) values) with a window size of 10points. From the signal xz[n], f4, f5, and f6 were calculated throughthe same process.

To obtain the HR from the continuous BVP signal, we developed asimple peak-searching method (based on the QRS detection method[45]) for locating R-peak points indicating heart beats. Next, wegenerated a time series of R–R distances, similar to heart rate variabil-ity (HRV) analysis. From the time series, we calculated mean (f7),spectral ratio (f8), max (f9), and min (f10). Feature f8 is the ratioof the spectral power between the low-frequency (0.003–0.15 Hz) andhigh-frequency (0.15–0.4 Hz) bands. Such ratio is generally thought todistinguish sympathetic effects from parasympathetic effects since theparasympathetic activity dominates at a high frequency.

Features f11, f12, and f13 are the mean, max, and min values of thefirst differences of the SCL signal, respectively.

F. Classification

For classification, we employed the feedforward artificial neu-ral network (ANN) in the 13-40-20-1 architecture with two hiddenlayers [46].

Page 6: Transsituational Individual-Specific Biopsychological Classification of Emotions

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 4, JULY 2013 993

Observation of Different Classification Cases: Based on the varioustraining sets, as determined in Fig. 3, we considered five different casesfor individual and inter-individual classifications.

• Case 1: Leave-one-out and leave-one-subject-out classificationsof ES-2 (training set 1) versus ES-5 (training set 2) of round one,without feature selection.

• Case 2: Leave-one-out and leave-one-subject-out classificationsof ES-2 (training set 3) versus ES-5 (training set 4) of round two,without feature selection.

• Case 3: Training of the classifier on ES-2 (training set 1) andES-5 (training set 2) of round one and testing on ES-2 (trainingset 3) and ES-5 (training set 4) of round two, without featureselection.

• Case 4: Training of the classifier on ES-2 (training set 1) andES-5 (training set 2) of round one and testing on ES-2 (trainingset 3) and ES-5 (training set 4) of round two, with automaticfeature selection on round one only, by using exhaustive opti-mization (brute-force search) with linear regression.

• Case 5: Training of the classifier on ES-2 (training set 1) andES-5 (training set 2) of round one and testing on ES-2 (trainingset 3) and ES-5 (training set 4) of round two, with manual featureselection (such as sequential backward search) on rounds oneand two.

Leave-One-Out Classification: To test its performance for individ-ual classification, the ANN classifier was trained with the preprocesseddata of ES-2 (target = 0) and ES-5 (target = 1). Ten vectors werenot used for training. Because of the overlap, we have taken out ninevectors following the one vector that was to be tested. Then, theclassifier was tested with the first of the left-out vectors. Each of thevectors was left out once and then used for testing, and the output wasrecorded. We rated an output x ≥ 0.5 as 1 and x < 0.5 as 0.

Leave-One-Subject-Out Classification: To test its performance forinter-individual classification, the ANN classifier was trained with thepreprocessed data of ES-2 (target = 0) and ES-5 (target = 1) of allbut one subject. Then, the classifier was tested with the data of thesubject that was left out for training. The output was recorded. Here,we also rated an output x ≥ 0.5 as 1 and x < 0.5 as 0.

Efficacy Criterion: For the efficacy comparison of individual versusinter-individual classifiers, paired t-tests and Cohen’s effect sizes dwere calculated [41]

d =x̄1 − x̄2√(s21 + s22)/2

where x̄ and s denote the mean value and the standard deviation,respectively.

III. RESULTS

Fig. 6 shows the recognition results of the five cases.

• Case 1: The comparison between individual (N = 20,M =92.6%, and SD = 10.7%) versus inter-individual classifiers(N = 20,M = 53.1%, and SD = 13.4%) for round one wassignificant (p = 0.0001) with a high effect size (d = 2.2).

• Case 2: The comparison between individual (N = 20,M =92.3%, and SD = 4.9%) versus inter-individual classifiers (N =20,M = 46.1%, and SD = 7.4%) for round two was significant(p = 0.0001) with a high effect size (d = 5.4).

• Case 3: The individual classifiers for all 20 subjects were trainedfor round one and recognized for round two with a classificationrate of 49.6% (SD = 19.6%).

Fig. 6. Recognition results of Cases 1–5 in percentage of correctclassification.

TABLE ILIST OF SELECTED FEATURES FOR CASES 4 AND 5

• Case 4: We made individual specific feature selection for roundone. We trained round one and recognized for round two with aclassification rate of 53.0% (SD = 17.4%).

• Case 5: We made manual individual-specific feature selection forrounds one and two. We trained round one and recognized forround two with a classification rate of 70.1% (SD = 17.8%).

As shown in Fig. 6, during the training of round one and the test ofround two (Cases 3–5), the classification performance individually de-creased in comparison to the leave-one-out cross-validation of roundsone and two (Cases 1 and 2). It is unlikely that habituation duringround two caused recognition rates to drop, as the separate individual-specific classification of round two showed recognition rates of com-parable accuracy with round one. In Case 4, the individual-specificfeature selection did not lead to a significant improvement on therecognition rates, just marginally, of 3.4% in comparison with Case 3,whereas the combined (transsituational) feature selection on roundsone and two (Case 5) improved 20.5% of the recognition rate. Table Ishows the automatically selected features for the classification ofround one (Case 4) and the manually selected transsituational featuresfor rounds one and two (Case 5). In Table II, the selection frequencyof each feature after the transsituational feature selection (Case 5) islisted. It turned out that both EMG signals from corrugator and zygo-maticus muscles were highly relevant and effective as complementaryvariables for the differentiation of the octants PLH and NHL in Cases 4

Page 7: Transsituational Individual-Specific Biopsychological Classification of Emotions

994 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 4, JULY 2013

TABLE IISELECTION FREQUENCY OF EACH FEATURE THROUGH THE

TRANSSITUATIONAL FEATURE SELECTION (CASE 5)

TABLE IIISAM RATING OF ES-2 VERSUS ES-5

and 5, too. On the other side, the mean value of HRV time series (f7)was effective in Case 4 but selected for only one subject in Case 5,whereas the spectral ratio (f8) showed its consistent efficacy for bothcases.

To verify which descriptive emotional states are actually perceivedby the subjects during the experiment, Table III shows the rating resultfrom subject questionnaires (see Section II-D). All participants statedthat they experienced the same emotion in both the first and secondrounds.

IV. CONCLUSION

In this paper, we have dealt with the problem of automatic emotionrecognition in a specific context with situational (temporal) transi-tion. We conducted a laborious experiment by repeating a mentaltraining scenario with 20 subjects and multichannel biosensors. Forunderstanding the main sense of our psychological considerationson how to induce target emotions, we described in detail the ex-periment procedure employing a carefully designed dialogic system.Two reciprocal emotional states, i.e., PLH and NHL octants of theVAD model, were successfully induced through intermediate inductionphases (ES-1, ES-3, ES-4, and ES-6) that were designed to ensuresmoothing emotional transition between both emotional states. Weanalyzed the difference of classification rates in five classificationcases that were determined based on training type (individual or inter-individual training), feature selection, and transsituational testing. Theefficacy of each feature was also verified for the classification cases,e.g., the dominance of EMG features. Overall, this paper demonstratedthat the individually adapted recognition is statistically significantlysuperior—and this is what we had been missing in the previousstudies—with large effect sizes to the inter-individual approach; andthis aspect is more decisive when recognizing emotions during situa-tional transition. One explanation of the transsituational problem couldbe the social judgement theory [47].

All in all, it remains unclear whether it will be possible to transsi-tuationally extract stable individual-specific features in a situation, ina similar context or even different contexts. This may be only tenden-tiously possible if classification algorithms are provided with sufficientindividual-specific information with regard to the user’s emotionalconfiguration through a multimodal channel, e.g., speech prosody,mimicry, and biopsychological data. Taking this into account, weintend to continue future studies on natural simulation of temporallyspatially different emotional situations, including subtle transitions,in order to find a conclusive guideline to transsituationally adaptiveemotion recognition.

REFERENCES

[1] H. Traue and H. Kessler, “Psychologische Emotionskonzepte [psychologyemotion concepts],” in Natur und Theorie der Emotion, A. Stephan andH. Walter, Eds. Paderborn, Germany: Mentis, 2003, pp. 20–33.

[2] L. Schmidt-Atzert, “Klassifikationen von emotionen [classificationof emotion],” in Experimentelle Emotionspsychologie, W. Janke,M. Schmidt-Daffy, and G. Debus, Eds. Lengerich, Germany: Pabst,2008, pp. 179–192.

[3] P. Ekman and W. Friesen, Facial Linear Networks and Systems.Palo Alto, CA: Consulting Psychologists Press, 1978.

[4] J. Kim and E. André, “Emotion recognition based on physiologicalchanges in listening music,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 30, no. 12, pp. 2067–2083, Dec. 2008.

[5] J. Kim, “Bimodal emotion recognition using speech and physiologicalchanges,” in Robust Speech Recognition and Understanding, M. Grimmand K. Kroschel, Eds. Vienna, Austria: I-Tech Education Publ., 2007,pp. 265–280.

[6] C. Frantzidis, C. Bratsas, C. Papadelis, E. Konstandinidis, C. Pappas, andP. Bamidis, “Towards emotion aware computing: An integrated approachusing multi-channel neuro-physiological recordings and affective visualstimuli,” IEEE Trans. Inf. Technol. Biomed., vol. 14, no. 3, pp. 589–597,May 2010.

[7] T. Schuster, S. Gruss, H. Kessler, A. Scheck, H. Hoffmann, and H. Traue,“EEG—Pattern classification while viewing emotional pictures,” inProc. 3rd Int. Conf. Pervasive Technol. Related Assistive Environ., 2010,pp. 12–17.

[8] P. Lang, “The emotion probe: Studies of motivation and attention,” Am.Psychol., vol. 50, no. 5, pp. 372–385, May 1995.

[9] W. Wundt, Lectures on Human and Animal Psychology, J. E. Creightonand E. B. Titchener, Eds. New York: Macmillan, 1896.

[10] H. Schlossberg, “Three dimensions of emotions,” Psychol. Rev., vol. 61,no. 2, pp. 81–88, Mar. 1954.

[11] A. Mehrabian, “Valence-arousal-dominance: A general frame-work fordescribing and measuring individual differences in temperament,” Curr.Psychol. Develop., Learn., Personality, vol. 14, no. 4, pp. 261–292,Dec. 1996.

[12] J. Russel, “A circumplex model of affect,” J. Personality Social Psychol.,vol. 39, no. 6, pp. 1161–1178, Dec. 1980.

[13] B. Cuthbert, H. Schupp, M. Bradley, N. Bierbaumer, and P. Lang, “Brainpotentials in affective picture processing: Covariation with autonomicarousal and affective report,” Biol. Psychol., vol. 52, no. 2, pp. 95–111,Mar. 2000.

[14] H. Schupp, B. Cuthbert, M. Bradley, C. Hillman, A. Hamm, and P. Lang,“Brain processes in emotional perception: Motivated attention,” Cognit.Emotion, vol. 18, no. 5, pp. 563–611, Aug. 2004.

[15] H. T. Schuppa, J. Stockburgera, F. Bublatzkya, M. Junghöferb,A. I. Weikec, and A. O. Hamm, “The selective processing of emotionalvisual stimuli while detecting auditory targets: An ERP analysis,” BrainRes., vol. 1230, pp. 168–176, Sep. 2008.

[16] J. Helmert, F. Schrammel, S. Pannasch, and B. Velichkovsky, “Humaninteraction with emotional virtual agents: Differential effects of agents’attributes on eye movement and EMG parameters,” in Proc. Perception36 ECVP Abstract Suppl., 2007, p. 28.

[17] P. J. Lang, M. Greenwald, M. M. Bradley, and A. O. Hamm, “Looking atpictures: Affective, facial, visceral and behavioral reactions,” Psychophys-iology, vol. 30, no. 3, pp. 261–273, May 1993.

[18] D. Sabatinelli, M. Bradley, J. Fitzsimmons, and P. Lang, “Paral-lel amygdala and inferotemporal activation reflect emotional inten-sity and fear relevance,” NeuroImage, vol. 24, no. 4, pp. 1265–1270,Feb. 2005.

[19] J. T. Cacioppo, “Feeling and emotions: Roles for electrophysiologicalmarkers,” Biol. Psychol., vol. 67, no. 1/2, pp. 235–243, Oct. 2004.

Page 8: Transsituational Individual-Specific Biopsychological Classification of Emotions

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 4, JULY 2013 995

[20] J. Coan and J. Allen, “Frontal EEG asymmetry as a moderator andmediator of emotion,” Biol. Psychol., vol. 67, no. 1/2, pp. 7–49,Oct. 2004.

[21] E. Marosi, O. Bazán, G. Yañez, J. Bernal, T. Fernández, M. Rodrguez,J. Silva, and A. Reyes, “Narrow-band spectral measurements of EEGduring emotional tasks,” Int. J. Neurosci., vol. 112, no. 7, pp. 871–891,Jul. 2002.

[22] J. A. Healey, “Wearable and automotive systems for affect recognitionfrom physiology,” Ph.D. dissertation, MIT, Cambridge, MA, 2000.

[23] J. A. Healey and R. W. Picard, “Detecting stress during realworld driv-ing tasks using physiological sensors,” IEEE Trans. Intell. Transp. Syst.,vol. 6, no. 2, pp. 156–166, Jun. 2005.

[24] J. Wang and Y. Gong, “Recognition of multiple drivers’ emotional state,”in Proc. Int. Conf. Pattern Recognit., 2008, pp. 1–4.

[25] C. L. Lisetti and F. Nasoz, “Using noninvasive wearable computers torecognize human emotions from physiological signals,” Eurasip J. Appl.Signal Process., vol. 2004, pp. 1672–1687, Jan. 2004.

[26] G. Chanel, J. J. Kierkels, M. Soleymani, and T. Pun, “Short-term emotionassessment in a recall paradigm,” Int. J. Hum.-Comput. Stud., vol. 67,no. 8, pp. 607–627, Aug. 2009.

[27] J. J. Kierkels, M. Soleymani, and T. Pun, “Queries and tags in affect-based multimedia retrieval,” in Proc. Int. Conf. Multimedia Expo—SpecialSession Implicit Tagging, 2009, pp. 1436–1439.

[28] M. Soleymani, G. Chanel, L. J. M. Kierskels, and T. Pun, “Affectivecharacterization of movie scenes based on content analysis and physio-logical changes,” Int. J. Semantic Comput., vol. 3, no. 2, pp. 235–254,Jun. 2009.

[29] L. Li and J. H. Chen, “Emotion recognition using physiological signals,”in Lecture Notes in Computer Science. New York: Springer-Verlag,2006.

[30] P. Lang, M. Bradley, and B. Cuthbert, “International Affective PictureSystem (IAPS): Affective ratings of pictures and instruction manual,”Univ. Florida, Gainesville, FL, Tech. Rep., 2008.

[31] R. Picard, E. Vyas, and J. Healy, “Towards machine emotional intelli-gence,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 10, pp. 1175–1191, Oct. 2001.

[32] J. I. Lacey, D. Bateman, and R. Vanlehn, “Autonomic response specificity:An experimental study,” Psychosomatic Med., vol. 15, no. 1, pp. 8–21,Jan. 1953.

[33] J. Fahrenberg and F. Foerster, “Covariation and consistency of ac-tivation parameters,” Biol. Psychol., vol. 15, no. 3/4, pp. 151–169,Nov./Dec. 1982.

[34] P. Kuppens, I. V. Meuchelen, J. B. Nezlek, D. Dossche, andT. Timmermanns, “Individual differences in core affect variability andtheir relationship to personality and psychological adjustment,” Emotion,vol. 7, no. 2, pp. 262–274, May 2007.

[35] J. D. Haynes and G. Rees, “Decoding mental states form brain ac-tivity in humans,” Nat. Rev. Neurosci., vol. 7, no. 7, pp. 523–534,Jul. 2006.

[36] G. Stemmler and J. Wacker, “Personality, emotion, and individual differ-ences in physiological reponses,” Biol. Psychol., vol. 84, no. 3, pp. 541–551, Jul. 2010.

[37] P. Rani, N. Sakar, and J. Adams, “Anxiety-based affective communicationfor implicit human–machine interaction,” Adv. Eng. Inform., vol. 21, no. 3,pp. 323–334, Jul. 2007.

[38] C. Liu, K. Conn, N. Sakar, and W. Stone, “Physiology-based affect recog-nition for computer-assisted intervention of children with autism spec-trum disorder,” Int. J. Hum.-Comput. Stud., vol. 66, no. 9, pp. 662–677,Sep. 2008.

[39] P. Rani, C. Liu, and N. Sakar, “Interaction between human and robot—An affect-inspired approach,” Interact. Stud., vol. 9, no. 2, pp. 230–257,2008.

[40] C. Liu, K. Conn, N. Sakar, and W. Stone, “Online affect detection androbot behavior adaptation for intervention of children with autism,” IEEETrans. Robot., vol. 24, no. 4, pp. 883–896, Aug. 2008.

[41] J. Cohen, “A power primer,” Psychol. Bull., vol. 112, no. 1, pp. 155–159,Jul. 1992.

[42] N. O. Bernsen, H. Dybkjaer, and L. Dbkjaer, “Wizard of Oz prototyping:How and when?” presented at the Proc. CCI Working Papers Cognit.Sci./HCI, Roskilde, Denmark, 1994, Paper WPCS-94-1.

[43] J. F. Kelley, “An iterative design methodology for user-friendly naturallanguage office information applications,” ACM Trans. Inf. Syst., vol. 2,no. 1, pp. 26–41, Jan. 1984.

[44] M. M. Bradley and P. J. Lang, “Measuring emotion: The Self-AssessmentManikin and the semantic differential,” J. Behav. Ther. Exp. Psychiat.,vol. 25, no. 1, pp. 49–59, Mar. 1994.

[45] R. Rangayyan, Biomedical Signal Analysis: A Case-Study Approach.Piscataway, NJ: IEEE Press, 2001, ser. Biomedical Engineering.

[46] J. C. Principe, “Artificial neural networks,” in The Electrical Engineer-ing Handbook, 2nd ed., R. C. Dorf, Ed. Boca Raton, FL: CRC Press,1997.

[47] M. Sherif and C. I. Hovland, Social Judgment: Assimilation andContrast Effects in Communication and Attitude Change. Oxford, U.K.:Yale Univ. Press, 1961.