sensing (e)motions

33
Amsterdam 2011 Sensing (E)motions A review of state of the art motion and emotion sensing technologies in human computer interaction Nikolaos Poulios MSc. Computer Science / Multimedia Vrije Universiteit Amsterdam Creative Learning Lab – Waag Society [email protected]

Upload: nikos-poulios

Post on 11-Mar-2016

244 views

Category:

Documents


15 download

DESCRIPTION

A review of state of the art motion and emotion sensing technologies in human computer interaction

TRANSCRIPT

Page 1: Sensing (E)motions

 

 

 

 

 

         

 

 

 

 

 

 

 

Amsterdam  2011  

Sensing  (E)motions  A  review  of  state  of  the  art  motion  and  emotion  sensing  technologies  in  human  computer  interaction  Nikolaos  Poulios  

MSc.  Computer  Science  /  Multimedia  

Vrije  Universiteit  Amsterdam  

Creative  Learning  Lab  –  Waag  Society  

[email protected]  

Page 2: Sensing (E)motions

2    

 

Table  of  Contents  

1.  Introduction................................................................................................ 3  

2.  Basic  motion  sensors .................................................................................. 4  2.1  Sensing  forces.................................................................................................................................... 4  2.2  Detecting  motion.............................................................................................................................. 5  2.3  Measuring  distance ......................................................................................................................... 5  3.  Motion  Capture  and  tracking  systems......................................................... 6  3.1  Optical  Systems................................................................................................................................. 6  3.2  Non-­‐optical  systems ....................................................................................................................... 9  3.4  Motion  capture  libraries .............................................................................................................10  4.  Motion  sense  in  interaction ...................................................................... 11  Hand  Tracking.........................................................................................................................................11  Head/Face  Tracking .............................................................................................................................11  Eye  Tracking ............................................................................................................................................12  Nintendo  Wii  Remote ..........................................................................................................................13  Floor  boards.............................................................................................................................................13  Sony  PlayStation  Move........................................................................................................................13  Microsoft  Kinect .....................................................................................................................................14  5.  Sensing  emotions...................................................................................... 15  5.1  Speech  analysis ...............................................................................................................................17  5.2  Facial  expressions..........................................................................................................................18  5.3  Body  movement/postures .........................................................................................................19  5.4  Pupil  size............................................................................................................................................20  5.5  Bio-­‐sensors .......................................................................................................................................20  5.6  Brain  Computer  Interfaces  (BCI) ............................................................................................21  5.7  Developing  Tools  for  Emotional  Intelligence.....................................................................23  6.  Sensor  Hardware  Platforms ...................................................................... 25  6.1  Arduino...............................................................................................................................................25  6.2  .Net  Gadgeteer .................................................................................................................................26  6.3  Phidgets..............................................................................................................................................26  6.4  Shimmer.............................................................................................................................................27  6.5  I-­‐CubeX ...............................................................................................................................................27  7.  Interactive  Software  Development  Frameworks ....................................... 28  Visual  Programming  Languages .....................................................................................................28  Working  with  sensors..........................................................................................................................29  Bibliography ................................................................................................. 31  

References.................................................................................................... 31    

   

Page 3: Sensing (E)motions

3    

1.  Introduction    

This  document  is  a  study  of  current  trends  in  human  computer  physical  interaction,   focusing   on   input   interfaces   utilizing   motion,   and   emotion  sensing  technologies.  This  study   is  meant  to  be  a  review  of  state  of   the  art  interaction   systems,   presenting   their   main   characteristics,   as   well   as  hardware  and  software  frameworks  to  facilitate  the  development  of  projects  utilizing  motion  sensing  technologies  and  multi-­‐modal  emotion  recognition  techniques  based  on  image,  vocal,  and  biophysical  signals  analysis.  

Physical   interaction   has   been   a   challenge   for   HCI   researchers   and  designers   for   many   years,   but   it   was   recent   technological   progress   that  allowed   the   production   of   innovative   and   practical   motion   sensing  interfaces.  These   interfaces   introduce  new  possibilities   for   interaction,  but  also   the   challenge   for   designers,   to   adopt   new   technologies   on   existing  mental  models.  A   common  pitfall   in  motion  based   interaction  design,   is   to  end  up  with  a   system   that   requires   from   the  user   to  move   in  a  very   strict  way,  using  very  specific  gestures,  and  thus,  not  exactly  physical.  Facing  this  challenge,   most   systems   presented   in   this   document,   are   currently   used  commercially  mainly   on   video   games,  which   are   a   non-­‐critical   application  field,  offering  more  freedom  for  experimentation  with  new  technologies.    

Most   commercial   games   with   motion   interaction   are   based   on   sports  themes,   encouraging   the   physical   activity   of   the   player   and   enchasing   the  entertainment  value.  Beyond  exertion  games,  motion  interaction  technology  provides   the   base   for   a   new   range   of   educational   games   based   on   virtual  worlds,   offering  user   immersion   and  evoking   lifelike   experiences,   focusing  on   embodied   and   playful   learning.   Gymnastics   course   is   included   in   all  school  programs  and  has  proved  to  help  not  only  the  physical,  but  also  the  mental  state  of  students.  Motion  interaction  gives  the  opportunity  to  embed  movement  in  the  learning  process  with  a  playful  and  more  active  approach.    Learning  via  movement  may  add  an  additional  modality  and  prime  for  later  recall   of   knowledge.   Creating   more   opportunities   for   physical,   embodied  learning  may  give  the  ability  to  students  to  utilize  more  neural  connections  -­‐via  movement-­‐  to  aid  in  recall  of  new  knowledge  [1].    

Embedment  of  emotion  sensing  technologies  adds  another  dimension  to  game   dynamics,   and   interactive   story   telling,   enhancing   affective  communication   and   the   idea   of   extended   cognition,  where   the  mind,   body  and   environment,   form   together   a   complete   cognition   system   of   its   own.  Multi-­‐modal  signal  analysis  can  provide  us  with  a  better   insight  of  players’  state  and  behaviour  and  allow  us  to  develop  more  personalized,  educational,  training  and  assessment  tools.    Virtual  worlds  and  the  dynamics  of  game  and  play   provide   excellent   possibilities   to   train   and   assess   social   emotional  competencies,   and   allow   learners   to   interact   with   simulated   conflict  situations  in  the  realms  of  a  safe  and  confined  space.  

 

Page 4: Sensing (E)motions

4    

2.  Basic  motion  sensors      

This   part   of   the   document   is   a   short   presentation   of   the   fundamental  motion   sensors,   used   in   applications   that   are   studied   further   in   the  document.   These   sensors   are   basic   electronics,   with   a   very   particular  function,  translating  changes  in  one  form  of  energy  to  changes  in  electrical  energy.   All   the   presented   sensors   are   already   around   us   for   quite   a  while  now,   in   every   day   systems   like:   automatic   sliding   doors   and   lights,   alarm  systems,   cars,   and   various   industrial   control   systems.   During   recent   years  the   progress   of   technology   has   reduced   their   size   and   cost,   allowing   their  application  to  a  variety  of  devices  like  mobile  phones  and  game  controllers,  while  certain  projects  have  developed  frameworks  to  facilitate  and  simplify  their   use   in  multi-­‐purpose   applications  made   by   a  wider   range   of   people,  involved  on  designing  and  programming  of  interactive  systems.  

2.1  Sensing  forces    

Piezoelectric   sensors   are   a   category   of   sensors   that   use   the  piezoelectric   effect   to   measure   pressure,   acceleration,   stain   or   force   by  converting   them   to   an   electrical   charge.   Piezoelectricity   is   the   ability   of  some   materials,   notably   crystals   and   certain   ceramics,   to   generate   an  electric  potential  in  response  to  physical  stress.    

Force-­Sensing  Resistors  are  materials  whose  resistance  changes  when  a   force   is   applied   on   them.  Flexible   force   sensors   are   ultra-­‐thin,   flexible  printed   circuits,   consisting   of   two   laminated   layers   of   conductive  material  and  pressure-­‐sensitive  ink.  The  resistance  of  a  flexible  sensor  in  a  circuit  is  decreased  under  pressure.  Flexible  sensors  are  used  to  measure  forces  in  a  higher  range  than  that  of  a  piezoelectric  sensor.  

Capacitance   sensors   are   very   sensitive   sensors,   detecting   anything  that  is  conductive  or  has  a  dielectric  different  than  that  of  the  air.  Nowadays  they   are   usually   found   in   touch   screens,   though   there   are   capacitance  sensors   that   can   detect   body’s   charge   from  distances   up   to   a  meter   (such  sensors  are  used  by  the  Theremin  musical  instrument).    

Accelerometer   is   a   sensor   that   measures   the   change   in   speed   of  movement,   or   acceleration.   Conceptually,   an   accelerometer   behaves   as   a  damped   mass   on   a   spring.   When   the   accelerometer   experiences  acceleration,   the   mass   is   displaced   to   the   point   that   the   spring   is   able   to  accelerate  the  mass  at  the  same  rate  as  the  casing.  The  displacement  is  then  measured  to  give  the  acceleration.  An  accelerometer  thus  measures  weight  per   unit   of   (test)  mass,   a   quantity   also   known   as  specific   force,   or  g-­‐force.  Another  way  of   stating   this   is   that  by  measuring  weight,   an  accelerometer  measures   the  acceleration  of   the   free-­‐fall   reference   frame  relative   to   itself.  Accelerometers  typically  have  two  or  sometimes  three  axis  of  measurement.  

 Gyroscopes   are   sensors   that  measure   angular   acceleration.   They   are  similar   to   accelerometers,   except   that   they  measure   how   fast   the   angle   of  

Page 5: Sensing (E)motions

5    

rotation   is   changing,   rather   than  measuring   acceleration   in   a   straight   line.  Gyroscopes   work   based   on   the   principles   of   conservation   of   angular  momentum.  Mechanical  gyroscopes  are  consisted  by  a  hi  rate  spinning  disk  whose  axle  is  free  to  take  any  orientation,  mounted  on  a  set  of  two  gimbals  with  orthogonal  pivot  axes,  allowing  the  gyroscope  to  minimize  any  external  torque  and  preserve  its  orientation,  regardless  of  any  motion  of  the  platform  on  which  is  mounted.    

2.2  Detecting  motion    

Photoelectric  switches  use  a  light  beam  hitting  a  photosensitive  target  sensor.  When  a  body  breaks  the  beam,  passing  between  the  sensor  and  the  light  beam,  the  switch  is  activated.  

 Passive   infrared   sensors   measure   infrared   light   radiating   from  objects  in  their  field  of  view.  Apparent  motion  is  detected  when  an  infrared  source  with  one  temperature,  such  as  a  human,  passes  in  front  of  an  infrared  source  with  another  temperature,  such  as  a  wall.  

Magnetic  switches  consist  of  a  very  thin  pair  of  contacts  in  a  protective  housing.  When   exposed   to   a   magnet   they   are   drawn   together   closing   the  switch.    

Hall   effect   sensors   are   transducers   that   change   their   output   voltage  from  low  to  high  when  the  magnetic  field  around  them  changes.  

2.3  Measuring  distance    

Most   distance   sensors   use   an   energy   source,   transmitting   a   reference  signal,  and  a  sensor  measuring  the  signal  reflected  by  the  target,  back  to  the  source,   to   calculate   the   distance   of   the   target.     Most   applications   use  infrared  light  sensors,  sending  an  infrared  beam  and  read  the  reflection  of  the   beam   off   a   target.   For   longer   ranges,   ultrasonic   sensors   are   used,  sending   a   ping   of   ultrasonic   sound   and   then   timing   how   long   it   takes   to  bounce  back.  Alternative  implementations  of  distance  sensors  are  based  on  combination   of  magnetic   or   Hall   effect   sensors   (for   very   short   distances),  measuring  variations  in  a  reference  magnetic  field.  

 

 

   

 

Page 6: Sensing (E)motions

6    

3.  Motion  Capture  and  tracking  systems    

Motion  capture  (mocap)/tracking   is   the  process  of  recording/tracking  body  movement  and  map  it  on  to  the  movement  of  a  digital  model.  Human  body  movement  mechanics   is   a   topic   of   interest   for   science   since   ancient  years  and   today  many  different  disciplines  use  motion  analysis   systems   to  capture   movement   and   posture   of   the   human   body.   In   clinical   research,  motion   capture   has   been   used   to   analyze   walking   patterns   of   impaired  patients   in  order   to  receive   the  right  orthopedic   treatment,   to  monitor   the  progress   of   a   treatment,   and   to   help   the   designing   of   prosthetics.   Motion  analysis   is   also   widely   used   in   sports   to   analyze   and   optimize   athletes’  movement  in  order  to  achieve  better  performance.    

During  the  last  years  motion  capture  systems  has  been  used  extensively  in   the   areas   of   cinematography   and   video   games   in   order   to   animate  computer   generated   characters   with   natural   human   movement,   following  recorded  moves   of   an   actor   inside   special   studios,   replacing   the   tradition  animating  method   of   rotoscope   on  which   animators   trace   over   live   action  film   movement,   frame   by   frame.     Despite   the   high   cost   of   the   special  equipment,  space  and  setup  required  for  a  motion  capture  system;  they  are  preferred  by  some  productions  over       traditional  animation   techniques   for  their  ability  to  give  more  realistic  results  and  in  shorter,  or  even  real  time.  

   Motion  capture  systems  is  a  very  active  field  of  research,  today  there  are  many   alternative   type   of   systems   using   different   technologies   with  differences   in   accuracy,   functional   requirements   and   cost,   and   their  suitability  depends  on   the  nature  of   the  project.  The   range  of   applications  utilizing  motion  capture  is  becoming  wider,  following  the  progress  made  on  processors,  memory  chips  and  sensors  regarding  their  speed,  accuracy,  size  and   cost,   as   well   as   the   progress   on   algorithms   developed   for   data  processing.  The  two  major  categories  of  motion  capture  systems  are  optical  and  non-­optical.  

 

3.1  Optical  Systems    

Optical  systems  work  based  on  data  captured  from  a  single  or  multiple  image  sensors  calibrated  to  provide  overlapping  projections,  and  algorithms  to   triangulate   the   3D   position   of   a   subject   in   space.   Most   optical   systems  utilize  markers,  distinguishable  by  the  cameras  from  the  rest  of  the  captured  image   in   order   to   determine   their   position   easier   and  more   accurate.   The  process  of  motion  capture  begins  with  the  calibration  of  the  system  in  which  markers  are  placed  in  known  positions  and  every  camera  position  and  lens  distortion  is  calculated  accordingly.  If  two  calibrated  cameras  see  a  marker,  its   3D   position   can   be   determined.   After   calibration   of   the   system   a  performer  wears  markers  near  each  joint  of  her  body  to  identify  the  motion  by   the   positions   or   angles   between   the  markers.   The   number   of   cameras  

Page 7: Sensing (E)motions

7    

required  for  an  optical  system  depends  on  the  size  of  the  space  we  need  to  cover,  the  desired  accuracy  and  the  number  of  subjects  we  need  to  track  at  the   same   time.   Typically   a   system   like   that   consists   of   6   to   24   hi-­‐speed  cameras,   while   there   are   systems   using   hundreds   of   cameras   to   achieve  better   accuracy.   Optical   systems   are   characterized   by   the   captured   image  resolution   in   pixels,   the   sampling   frequency   in   hertz   and   the   frame   rate,  which   is   balanced   between   the   image   resolution   and   sampling   frequency.  Different  types  of  markers  exist  between  optical  systems.    

Passive   markers   are   the   simplest   type   of   markers,   featuring   retro-­‐  reflective  material  to  reflect  light  generated  near  the  cameras  lens.  Camera’s  threshold  is  adjusted  to  sample  bright  reflective  markers  ignoring  the  rest  of  the  captured  image.  Major  advantage  of  passive  markers  is  that  the  subject  does  not  need  to  wear  any  electronics  that  might  limit  her  freedom  to  move.  Passive  markers   are   attached   directly   to   the   skin   or   attached   to   specially  designed   spandex/lycra   full   body   suit.   The  major   disadvantage   of   passive  markers   is  what   is  called  markers  swapping,  meaning   that  all  markers  are  identical  and  the  system  might  mismatch  a  marker  with  the  corresponding  joint,  requiring  larger  number  of  cameras  to  avoid  the  problem.  

 Figure  1:  Active  marker  motion  capture  system

Active  markers  are  another  type  of  markers.  Instead  of  reflecting  light,  active  markers   use   LEDs   to   emit   light,   increasing   the  maximum   distances  and   volume   for   capture.   Optical   systems   using   active  markers   triangulate  positions  by   illuminating  one  LED  at   a   time  very  quickly  or  multiple  LEDs  with  software  to  identify  them  by  their  relative  position.  Refined  versions  of  active  markers  exist,  using  time  modulation  over  the  amplitude  or  pulse  of  the   LEDs   to   provide   marker   ID   in   order   to   eliminate   markers   swapping.     Computer  processing  of  modulated  IDs  offers  clearer  data  and  less  filtered  results.  This  higher  accuracy  and  resolution  requires  more  processing  than  passive  technologies,  but  the  additional  processing  is  done  at  the  camera  to  improve   resolution   via   a   subpixel   or   centroid   processing,   providing   both  high  resolution  and  high  speed.  

 

Page 8: Sensing (E)motions

8    

Both   technologies  mentioned   above   is  mainly   used   indoors   in   special  motion   capture   studios.   Passive   systems   are   usually   less   expensive   than  active  and  easier  to  set  up,  while  active  systems  are  more  accurate  and  after  the   initial   set   up,   require   less   time   to   get   results   from.   Commercial   active  and  passive  systems  are  available  from  companies  like  Vicon,  Naturalpoint,  Qualisys   and   PhaseSpace,   and   usually   cost   between   tens   and   hundreds   of  thousands  of  euro.  

Semi-­passive   -­   Photosensitive   markers.   Prakash   [2]   is   a   motion  capture  system  developed  in  MIT’s  Media  Lab  as  an  inexpensive  alternative  system  (the  overall   cost   is   less   than  1.000  euro),   suitable  also   for  outdoor  use   and   real   time   motion   capture.   Instead   of   using   expensive   hi-­‐speed  cameras,   Prakash   uses  multi-­‐LED   hi   speed   projectors  with   passive   binary  films   (masks)   set   in   front.   The   light   intensity   sequencing   provides   a  temporal   modulation   and   the   masks   provide   a   spatial   modulation.   Each  beamer  projects  invisible  (near  infrared)  binary  patterns  thousands  of  times  per   second.  Tags  with  photo   sensor   attached   to   the   scene  determine   their  location   by   decoding   the   transmitted   space-­‐dependent   labels.   Apart   from  their  position,  tags  can  compute  their  own  orientation,  incident  illumination,  and  reflectance.  These  tracking  tags  work  in  natural  lighting  conditions  and  can   be   imperceptibly   embedded   in   attire   or   other   objects.   The   system  supports   an   unlimited   number   of   tags   in   a   scene,   with   each   tag   uniquely  identified  to  eliminate  marker-­‐swapping  issues.  Since  the  system  eliminates  a   high-­‐speed   camera   and   the   corresponding   high-­‐speed   image   stream,   it  requires  significantly  lower  data  bandwidth.  The  tags  also  provide  incident  illumination  data,  which  can  be  used  to  match  scene  lighting  when  inserting  synthetic  elements.    

Markerless  Motion  Capture.  Motion  capture  and  computer  vision  have  been  a  very  active  field  of  research  during  the  15  last  years  and  there  have  been  a   lot  of  studies  to  develop  markerless  motion  capture  systems,  based  on   the   use   of   a   single   or   multiple   cameras   and   optimized   image   analysis  algorithms,   and   with   comparable   performance   to   that   of   more   expensive  commercial  systems,  previously  mentioned.  

Recently   a   team   from   the   Carnegie   Mellon   University   working   with  Disney  Research  presented  a  system  featuring  small  body-­‐mounted  cameras  to   reconstruct   the   motion   of   a   subject   [3].   Outward-­‐looking   cameras   are  attached  to  the   limbs  of  the  subject,  and  the   joint  angles  and  root  pose  are  estimated   through   non-­‐linear   optimization.     The   optimization   objective  function   incorporates   terms   for   image   matching   error   and   temporal  continuity  of  motion.  Structure-­‐from-­‐motion  is  used  to  estimate  the  skeleton  structure   and   to   provide   initialization   for   the   non-­‐linear   optimization  procedure.   Global  motion   is   estimated   and   drift   is   controlled   by  matching  the   captured   set   of   videos   to   a   3D   reconstruction   of   the   scene   built   from  reference   imagery.  By  estimating  the  camera  poses,   the  global  and  relative  motion  of  an  actor  can  be  captured  outdoors  under  a  wide  variety  of  lighting  conditions  or  in  extended  indoor  regions  without  any  additional  equipment.

Page 9: Sensing (E)motions

9    

Several   other   techniques   and   algorithms   have   been   proposed   for  markerless  motion  capture  for  single  or  multiple  subjects.  Most  of  them  use  footage   from  multiple   cameras   to  make  a  volumetric   reconstruction  of   the  body   using   background   removal,   skin   color   detection,   “shape   from  silhouette  (SFS)”  and  structure  from  motion  methods.  The  formalism  of  SFS  was   introduced  by  A.   Laurentini   [4].   By  definition,   an   object   lies   inside   the  volume   generated   by   back-­‐projecting   its   silhouette   through   the   camera  center  (called  silhouette’s  cone).  With  multiple  views  of  the  same  object  at  the  same  time,   the   intersection  of  all   the  silhouette’s  cones  build  a  volume  called  ”Visual  Hull”,  which  is  guaranteed  to  contain  the  real  object.  After  the  visual   hull   has   been   constructed,   body   pose   is   estimated   by   fitting   shape  models   of   specific   body   parts   to   the   volume   or   by   applying   heuristic  assumptions   of   features   related   to   position   and   establish   the  correspondence   of   joints   between   successive   frames.   Markerless   motion  capture   systems  based  on   these  methods  have  been  developed  by   various  academic   research   laboratories,   like   the   BioMotion   Lab   of   Stanford  University   [5],   the  University  of  Amsterdam   [6],   the  Max  Planck   Institute   [7],  and  commercial  systems  like  Organic  Motion’s  solutions.    

3.2  Non-­‐optical  systems    

This  category  includes  all  motion  capture  systems  that  instead  of  image  sensors,  they  are  using  alternative  types  of  sensors  to  capture  motion.  These  systems   collect  data   from  wearable   sensors  attached   to   the   subject’s  body  and   translate   them   into   motion   in   space.   Their   main   advantage   is   that  because   they  are  not  based  on   cameras,   they  don’t   require  a   studio   setup,  they  are  more  portable  and  they  can  be  used  outdoors,  capturing  motion  in  large   areas   and   independent  of   light   conditions.  Their  main  disadvantages  are   that   usually   they   are   less   accurate   than   optical   systems   and   that   they  might  limit  the  subject’s  freedom  to  move  and  perform.  

Inertial  systems  use  miniature  inertial  sensors  attached  to  the  joints  of  the   body,   biomechanical  models   and   sensor   fusion   algorithms   to   translate  data   into   motion.   Starting   from   a   known   position,   inertial   systems   use  wireless   accelerometers   and   gyroscopes,   sending   data   to   a   computer   to  continuously   calculate   the   position,   orientation   and   velocity   of   the   subject  with   full   six  degrees  of   freedom  body  motion.    Their   accuracy  depends  on  the   number   of   sensors   used.   Commercial   inertial  motion   capture   systems  are  available  from  companies  like  XSens  and  Animazoo.  

Mechanical   or   exo-­skeleton   systems   use   a   skeletal-­‐like   structure  worn   by   the   subject,   consisting   either   by   straight   metal   or   plastic   rods,  linked  together  with  potentiometers  articulating  the  joints,  or  using  flexible  sensors  to  measure  joint  angles  during  motion.  Mechanical  systems  are  real  time   and   low   cost   but   they   capture   only   the   relative   movement   of   the  subject,  requiring  an  external  absolute  positioning  system  and  they  might  be  not  comfortable  for  a  performer  to  wear.  Commercial  systems  like  the  Gypsy  7  by  Animazoo  combine  gyroscope  and  exo-­‐skeletal  to  capture  absolute  and  relative  motion.  

Page 10: Sensing (E)motions

10    

Magnetic   systems  utilize   sensors   placed   on   the   body   to  measure   the  low-­‐frequency   magnetic   field   generated   by   a   transmitter   source.   Position  and   orientation   are   calculated   by   the   relative   magnetic   flux   of   three  orthogonal   coils   on   both   the   transmitter   and   each   receiver.   The   relative  intensity  of  the  voltage  or  current  of  the  three  coils  allows  these  systems  to  calculate  both  range  and  orientation  by  meticulously  mapping  the  tracking  volume.  The   sensor   captures   6   degrees   of   freedom,  which   provides   useful  results  obtained  with  two-­‐thirds  the  number  of  markers  required  in  optical  systems;   one  on  upper   arm  and  one  on   lower   arm   for   elbow  position   and  angle.  Magnetic  systems  are   low  cost  but  nowadays  rarely  used  because  of  their  major  disadvantages.  Since  each  sensor  requires  its  own  (fairly  thick)  shielded   cable,   the   tether   used   by   magnetic   systems   can   be   quite  cumbersome.   Magnetic   systems   have   issues   with   azimuth.   If   an   actor   is  doing  a  push-­‐up   type  posture,   the  system  will  get  confused.  Multiple  actor  magnetic   setups   also   have   problems   with   two   or   more   actors   in   close  proximity.  Sensors   from  the  different  actors  will   interfere  with  each  other,  providing  distorted  results.  Magnetic  systems  have  very  negative  reactions  to   metal   or   magnetic   fields   in   the   environment   caused   by   metallic  construction  materials  in  buildings  or  other  electrical  appliances  in  use.    

 

3.4  Motion  capture  libraries    

As   mentioned   before,   motion   capture   is   an   easier   technique   to   give  realistic   motion   to   virtual   characters   and   although   most   motion   capture  systems   require   expensive   equipment   and   special   studios,   independent  developers   can   take   advantage   of   online   available   free   or   commercial  libraries,  which  include  motion  captured  data  from  various  human  activities,  in  file  formats  that  can  be  imported  in  3D  animating  software  and  mapped  to   any   character   model.   A   quick   search   for   motion   capture   libraries   will  return  a  long  list  of  resources,  among  them  the  Carnegie  Mellon  University,  which  has  published  a  very  large  motion  capture  database,   freely  available  at  http://mocap.cs.cmu.edu/,  http://www.mocapclub.com/,  which  includes  a   library   from   the   Motion   Capture   Society   association,   and  http://mocapdata.com,   which   is   also   a   large   resource   of   both   free   and  commercial  animation  files.  

 

 

 

 

 

 

Page 11: Sensing (E)motions

11    

4.  Motion  sense  in  interaction    

During   the   last   years,   sensors   and   principals   used   in   motion   capture  systems   have   been   used   in   smaller   scale,   to   low   cost   consumer   computer  input   devices,   to   provide   physical   interaction   input   interfaces.   During   the  last   five   years,   all   major   companies   in   the   video   game   industry   have  developed   different   technologies   for   games   and   controllers   with   motion  based   interaction.   Although   sports   have   always   been   a   popular   theme   on  video  games,  and  game  companies  started  to  explore  sensor  based  physical  interfaces   from   the  middle   of   1980s,   it  was   until   recently   that   technology  allowed  them  to  produce  wireless  and   lightweight  devices,  practical   to  use  as  game  controllers.  That  fact,  along  with  the  popularity  of  large  TV  screens  in   today’s   average   living   room,   have   created   the   bases   for   the   creation   of  games   offering   more   immersion   and   encourage   gamers’   physical   activity.  Today  “exertion  games”  or  “exergames”  is  a  growing  market,  attracting  also  more   people   that   were   not   traditionally   attracted   to   video   games,   and  considered  them  a  rather  passive  activity.  

This   part   is   a   presentation   of   current   techniques   and   examples   of  devices  for  physical  input  interfaces  and  game  controllers,  based  on  motion  sensors.  

Hand  Tracking    

Designing   wearable   input   interfaces,   usually   called   “data   gloves”,   to  allow  a  user  to  use  her  hands  and  fingers  to  navigate  in  a  virtual  world,  use  hand  gestures,  and  interact  with  objects  in  a  more  natural  way,  was  one  of  the  first  examples  of  natural  user  interfaces.  The  first  data  glove  was  created  in   1977,   and   since   then   a   few   companies   and   laboratories   came   up   with  their   own   implementations.   Data   gloves   use   various   sensors   as  accelerometers   or   gyroscopes   to   capture   hand   movement   and   flexible  sensors   for   the   bending   of   fingers.   Some   data   gloves   use   optical   fibers  attached  to  the  fingers  and  a  photocell  as  a  way  to  measure  bending,  since  some   light   escapes   the   fiber  when  bended.   Some  data   gloves   also   provide  haptic  feedback,  applying  small  forces  and  vibrations  to  give  users  a  sense  of  touch.  

Data   gloves   are   also   used   on   body   motion   capture   systems,   because  solutions   based   on   markers   are   not   able   to   capture   such   detail   in   finger  movement.  This  technique  is  called  hand-­‐over.  

Head/Face  Tracking    

Facial  expressions  and  small  facial  muscles  movement  is  also  difficult  to  capture  during  body  motion  capture.  For  that  reason  facial  motion  capture  is  done   in   a   separate   recording,  by  attaching  a   lot  of   small  markers   in   the  actors  face.    

Page 12: Sensing (E)motions

12    

In   the   field   of   interaction   and   the   gaming   industry,   head   tracking  devices  exist,  allowing  the  computer  to  set  a  camera’s  viewpoint  according  to   the   position   of   the   player   in   space.   Commercial   systems,   like  NaturalPoint’s  TrackIR,  use  an  infrared  sensor  and  active  markers  attached  to   player’s   head.   Other   systems,   like   a   lot   of   head   mounted   displays   for  virtual   reality   systems,   use   tilt   sensors   to   track   had  movement.   There   are  also   available   applications   that   use   a   plain   camera   and   automatic   face  detection   algorithms   to   track   user’s   position,   but   because   of   using   a   plain  camera  they  lack  or  are  less  accurate  on  movement  along  the  depth  axis.    

Eye  Tracking    

Eye  tracking  is  the  process  of  measuring  either  the  point  of  gaze  of  a  viewer  or  the  motion  of  an  eye  relative  to  head.  Eye  trackers  are  mostly  used  in  research  on  the  visual  system,  in  psychology,  in  cognitive  linguistics  and  also   in   marketing   research,   product   design   and   usability   testing,   to   spot  elements  that  attract  viewers  gaze  or  others  that  do  not.    

Eye  trackers  measure  rotations  of  the  eye  and  principally  the  fall  into  three   categories:   The   first   category   uses   an   attachment   to   the   eye,   like   a  contact   lens   with   an   embedded   mirror   or   magnetic   field   sensor.    Measurements   with   tight   fitting   contact   lenses   have   provided   extremely  sensitive   recordings   of   eye   movement,   and   magnetic   search   coils   are   the  method   of   choice   for   researchers   studying   the   dynamics   and   underlying  physiology   of   eye  movement.   The   second   category   uses   electric   potentials  measured  with  electrodes  placed  around  the  eyes.  The  eyes  are  the  origin  of  a  steady  electric  potential  field,  which  can  also  be  detected  in  total  darkness  and  if  the  eyes  are  closed.  It  can  be  modeled  to  be  generated  by  a  dipole  with  its  positive  pole  at  the  cornea  and  its  negative  pole  at  the  retina.  The  electric  signal   that   can  be  derived  using   two  pairs   of   contact   electrodes  placed  on  the   skin  around  one  eye   is   called  Electroculogram  (EOG).  If   the  eyes  move  from  the  centre  position  towards  the  periphery,  the  retina  approaches  one  electrode  while  the  cornea  approaches  the  opposing  one.  This  change  in  the  orientation  of  the  dipole  and  consequently  the  electric  potential  field  results  in   a   change   in   the   measured   EOG   signal.   Inversely,   by   analysing   these  changes  in  eye  movement  can  be  tracked.  

The   last  and  most  commonly  used  category   is  non-­‐intrusive,  optical  based  systems  using  the  Pupil  Centre  Corneal  Reflection  (PCCR)  technique.  This   technique   uses   a   light   source   to   illuminate   the   eye   causing   highly  visible   reflections,   and   a   camera   to   capture   an   image   of   the   eye   showing  these  reflections.  Image  processing  algorithms  are  then  used  to  identify  the  reflection   of   the   light   source   on   the   cornea   and   the   pupil.   Calculating   the  angle   between   the   two   reflections,   combined   with   other   geometrical  characteristics  of  the  reflections,  allow  us  to  determine  the  gaze  direction.    

There   are   two   different   illumination   setups   that   can   be   used   with  PCCR  technique:  bright  pupil  tracking,  where  an  illuminator  is  placed  close  to  the  optical  axis  of  the  imaging  device,  which  causes  the  pupil  to  appear  lit  

Page 13: Sensing (E)motions

13    

up;   and  dark  pupil,  where   the   illuminator   is   placed   away   from   the   optical  axis   causing   the   pupil   to   appear   darker   than   the   iris.   There   are   different  factors   affecting   the   pupil   detection   when   using   each   one   of   the   two  techniques   like   age   of   the   subject,   light   conditions   and   ethnicity.   Some  commercial   systems   like   Tobii   eye   trackers   can   use   both   techniques,  determining  the  best  technique  during  the  calibration  procedure  where  the  viewer  is  asked  to  gaze  at  certain  point  on  screen.    

Eye   trackers   can   also   be   used   as   an   interaction   input   interface,  replacing  a  mouse  for  example,  allowing  the  user  to  control  the  cursor  with  her  eyes.  EyeWriter  [8]  is  a  collaborative  research  project  for  building  an  eye  tracker   from   inexpensive   material,   along   with   open   source   software,  developed   to   empower   people   who   are   suffering   from   ALS   and   other  physical  disabilities  with  creative  technologies  

Nintendo  Wii  Remote    

In  2006,  Nintendo  released  its,  now  popular,  Wii  video  game  console.  The  major  innovation  of  Wii,  was  its  remote  game  controller,  the  Wii  Mote.  Wii    Mote  features  an  infrared  sensor  and  an  accelerometer,  that  allows  it  to  calculate  its  position  in  space  and  track  hand  movement.  Using  the  wii  mote,  the  player  is  able  to  aim  at  items  on  screen,  and  interact  using  gestures  and  natural  movement.    

Upon  its  release  date,   the  Wii  mote  gained  much  attention  thanks  to  its   advanced   features   and   quickly   became   very   popular   among  programming  enthusiasts,  who  wrote  software   that  allowed   the  use  of   the  device   beyond   the   game   console.   After   that   the   wii   mote   has   been   used  numerous  projects  as  a  controller,  or  as  an  infrared  sensor  to  track  infrared  LEDs,  attached  to  other   items,   for  example  a  head  tracking  system  like  the  one  previously  mentioned.    

Floor  boards    

Floorboards  equipped  with  pressure   sensors  were   the   first   attempt   to  make  an  input  interface,  with  which  a  player  would  utilize  her  whole  body  in  game  interaction.  The  first  controller  of  this  kind  was  created  by  Atari,  in  1982,   called   Joyboard.   In   2007,   Nintendo   released   a   modern,   wireless  version,  called  Balance  Board,  along  with  a  series  of  fitness  games  utilizing  it,  called  Wii  Fit,  for  the  Wii  game  console.  

Sony  PlayStation  Move    

Sony’s  motion  sensing  platform  for  the  PlayStation  console  includes  the  PlayStation  Eye  camera,  which  is  capable  of  capturing  standard  video  at  60  Hz,  at  640x480  pixel  resolution,  or  at  120  Hz  at  320x240  pixels,  along  with  computer  vision  and  gesture  recognition  software,  and  a  microphone  array  for  voice  location  tracking  and  voice  commands  recognition.    

Page 14: Sensing (E)motions

14    

The   PlayStation   Move   motion   controller   features   an   orb   at   the   head,  which  can  glow  in  any  of  a  full  range  of  RGB  colors  using  LEDs.  Based  on  the  colors  in  the  user  environment  captured  by  the  PlayStation  Eye  camera,  the  system  dynamically  selects  an  orb  color  that  can  be  distinguished  from  the  rest  of  the  scene.  The  colored  light  serves  as  an  active  marker,  the  position  of   which   can   be   tracked by   the   camera.   The   uniform   spherical   shape   and  known  size  of  the  light,  also  allows  the  system  to  accurately  determine  the  controller's   distance   from   the   camera   through   the   light's   image   size.   The  controller   also   features   an   accelerometer   and   a   gyroscope,   used   to   track  rotation   as  well   as   overall  motion. An   internal magnetometer is   also   used  for  calibrating  the  controller's orientation against  the  Earth's  magnetic  field  to  help  correct  against  cumulative  error  (drift)  by   the   inertial  sensors. The  inertial  sensors  can  be  used  to calculate  position  in  cases  where  the  camera  tracking   is   insufficient,  such  as  when  the  controller   is  obscured  behind  the  player's  back.  

Microsoft  Kinect    

Kinect   was   Microsoft’s   answer   on   motion   sensors   to   the   video   game  consoles   competition.   Initially   released   as   an   accessory   for   the   Xbox   360  game  console,  Kinect  was  the  first  consumer  device  that  allowed  real-­‐time,  markerless   full   body   3D   motion   capture   in   a   room   environment.   Kinect  features  a  normal  RGB  camera  and  a  depth  sensor,  consisting  of  an  infrared  laser  projector  and  an  infrared  camera,  capable  of  capturing  3D  video  data  at  30  Hz,  at  640x480  pixels.  The  sensor  also  includes  a  3-­‐axis  accelerometer  to  determine  its  orientation  and  a  four-­‐microphone  array  allowing  it  to  also  receive   voice   commands,   ambient   noise   reduction,   and   to   determine   the  source  location  of  a  sound.  The  most  innovative  part  of  the  Kinect  though,  is  a  microprocessor  running  a  “trained”,  by  using  machine  learning  and  a  large  training   set   of   images,   algorithm,   that   allows   it   to   track   multiple   bodies’  motion,  based  on  20  joints  for  each  body.  

 Kinect   uses   a   single   depth   image   [9],   which   is   segmented   into   a   dense  

probabilistic   body   part   labeling,   with   the   parts   defined   to   be   spatially  localized  near  skeletal  joint  of  interest.  Reprojecting  the  inferred  parts  into  world  space,  spatial  modes  of  each  part  distribution  are   localized  and  thus  generate   confidence-­‐weighted   proposals   for   the   3D   locations   of   each  skeletal   joint.   The   segmentation   into   body   parts   is   treated   as   a   per-­‐pixel  classification   task.   A   very   large   collection   of   realistic   depth   images   of  humans   of  many   shapes   and   sizes   in   highly   varied   poses   sampled   from   a  large   motion   capture   database   were   used   to   train   a   deep   randomized  decision   forest   classifier   which   avoids   over-­‐fitting.   Simple,   discriminative  depth   comparison   image   features   yield   3D   translation   invariance   while  maintaining   high   computational   efficiency.   Finally,   spatial   modes   of   the  inferred  per-­‐pixel  distributions  are  computed  using  mean  shift,  resulting  in  the  3D  joint  proposals.  

   

Page 15: Sensing (E)motions

15    

 Figure  2  Kinect  tracking  joints  

 Kinect  truly  revolutionized  the  field  of  natural  user  interface  for  gaming  

and  became  upon  its  release,  the  fastest  selling  electric  device  ever,  while  as  with   the   release   of   WiiMote,   quickly   attracted   the   attention   of   a   large  community   of   programming   enthusiasts  who  wrote  open   source   software,  allowing  the  use  of  Kinect  for  independent,  computer  platform  applications,  followed   by   a   large   number   of   projects   found   on   internet   including  interactive   application,   games,   installations   and   robotics,   utilizing   the  sensor.   After   the   release   on   Internet   of   a   large   number   of   impressive  examples  of  uses  of  the  Kinect,  companies  involved  in  its  development,  like  Prime   Sense   and   Microsoft,   decided   to   support   these   efforts   by   releasing  software  to  facilitate  independent  project  development.    

5.  Sensing  emotions    

The  vision  of  machines  with  emotional  intelligence  [10]  coexists  with  that  of  artificial  intelligence  since  the  invention  of  the  term.  It  is  a  popular  theme  in  science  fiction  literature,  featuring  androids  understanding  emotions  and  having   human   like   behavior,   and   aptly   raising   ethical   questions   about   the  use  of  such  technologies.  Although  we  are  still  quite  far  from  this  vision  (or  nightmare   for   some),   research   laboratories   around   the   world   work   on  developing   emotion-­‐sensing   technology   to   support   the   study   of   human  behavior,   the   affective   human   computer   interaction,   and   communication  between   people.   Automatic   recognition   of   human   effective   states   is   an  important   research   topic   for   a   broad   range   of   applications,   including  psychology   research,   computer   assisted   therapeutic   systems,   safety  monitoring  applications,  assessment  and  training  systems,  user  experience  studies,  marketing   research,   and  automatic  affect-­‐based   indexing  of  digital  material  [11].    

Emotion  recognition  can  make  social  interaction  more  affective  in  cases  where   there   are   difficulties   to   communicate   expressively,   for   example   for  people  on  the  autistic  spectrum,  where  an  autistic  person  might  outwardly  

Page 16: Sensing (E)motions

16    

appear   calm   and   relaxed,   while   experiencing   a   state   of   emotional   or  cognitive  overload  [12],  and  every  day  social  networking  applications  where  there   is   a   tendency   on   text   based   communication,   or   communicating  through  avatars  in  virtual  worlds.    

As  with  physical  interaction  interfaces,  a  lot  of  studies  experiment  with  the   application   of   physiological   sensors   on   video   games   and   interactive  story   telling   [13].   Video   games   are   an   excellent   application   area   to   explore  benefits   and   drawbacks   of   physiological   sensor-­‐interaction   because   there  are   less   severe   consequences   of   failure   than   in   critical   control   systems,  making  games  a  field  bridging  laboratory  research  and  commercial  systems.  It   has   also   been   shown   that   video   games   can   stimulate   strong   emotional  reactions   from   players,   making   them   an   appropriate   field   for   behavior  studies,   and   as   gaming   has   turned   into   a   huge   entertainment   industry,  companies   are   interested   to   use   physiological   feedback   for   game   design  evaluation.   Explorations   to   develop   “biofeedback”   games,   games   to   make  users  more   aware   of   their   physiological   state   and   train   them   to   control   it  using   game   dynamics,   started   from   the   early   1980s.   In   1984,   Thought  Technology   developed   a   racing   game   called   CalmPrix   [14],   utilizing   a  modified  galvanic  skin  response  sensor,  followed  by  other  innovative  game  companies  like  Atari  and  Nintendo  using  a  variety  of  bio-­‐sensors,  presenting  their   own   biofeedback   games.   Some   of   these   games   never  made   it   to   the  market  while  others  did,  but  without  the  expected  market  success.    

As  we  all  know  from  personal  experience,  emotions  are  hard  to  define  and   recognize.   Despite   all   our   senses,   the   verbal   and   non-­‐verbal  communications   skills  we  have  as  humans,   it   is  often  hard   to   immediately  recognize  someone’s  emotions,   if   they  are  real  or  pretended,   if  someone   is  talking  seriously  or  joking,  laughing  or  crying  etc.  Expression  of  emotions  is  becoming   even   more   complex   when   analyzed   in   a   global,   cross-­‐cultural  scale.   It   is  easy  to   imagine  thus,   that  emotion  recognition  is  a  very  difficult  task   for   a   computer,   especially   on   real   time   application  where   the   system  has   to  analyze   the  user’s   state  and  give  a   response  on  a  very  narrow   time  frame.   Classic   psychological   research   claims   the   existence   of   six   basic  expressions   of   emotion   that   are   universally   displayed   and   recognized:  happiness,   anger,   sadness,   surprise,   disgust,   and   fear   [15],   other   studies   on  emotion   recognition   also   include   emotions   like   despair,   interest,   irritation  and  pride  [16].  A  lot  of  studies  do  not  accept  this  categorization  of  emotions,  suggesting  that  it  is  not  emotions  but  some  components  of  emotions  that  are  universally   linked   with   certain   communicative   displays.   Most   theorists  agree   that   the   two   dominant   dimensions   of   emotion   can   be   described   as  valence   (pleasant  vs.  unpleasant)  and  arousal   (activated  vs.  deactivated  or  excited  vs.  calm)  [17].  Mapping  even  basic  emotions  on  these  two  dimensions  is   challenging,   and   emotion   recognition   systems   analyzing   single   human  modalities   like   voice   or   facial   expressions,   usually   suffer   either   from  poor  accuracy  or  over  simplified  classification  of  emotions.    

Page 17: Sensing (E)motions

17    

 Figure  3  Emotions  mapped  on  basic  dimensions  

The  next  part   is   a  presentation  of   the  various   sensors  used   to   capture  physiological   signals   that   can   be   associated   with   the   emotional   state   of   a  person,   along   with   software   for   emotion   recognition   developed   from  previous  research.  

5.1  Speech  analysis    

Speech   is   the   primary   method   of   human   communication.   Analysis   of  certain   features   extracted   by   speech   characteristics   like   intensity,   pitch,  phonetic   features,   voice   segments,   pause   length,   and   spectral   modeling,  along  with  linguistic  analysis  based  on  keywords  used,  can  be  used  to  make  conclusions  over  the  emotional  state  of  a  person  [18].    

EmoVoice   [19],   developed   by   the   university   of   Augsburg,   Human  Centered   Multimedia   Laboratory,   is   a   framework   for   emotional   speech  corpus   and   classifier   creation   and   for   offline   as  well   as   real-­‐time   on-­‐   line  speech   emotion   recognition.   The   framework   is  meant   to   be   used   by   non-­‐experts  and  therefore  comes  with  an  interface  to  create  an  own  personal  or  application  specific  emotion  recognizer.  EmoVoice  is  now  intergrated  to  the  SSI  framework  (see  emotion  frameworks)  

 openEar   [20],   developed   by   the   Technische   Universität   München,  Institute   for   Human-­‐Machine   Communication   ,is   an   open   source,   C++  library,   form   speech   processing   and   emotion   recognition,   combining  features  for  audio  recording,  feature  extraction,  and  classification  of  results,  along  with  pre-­‐trained  models.  

   

Page 18: Sensing (E)motions

18    

 

5.2  Facial  expressions  

Facial  expressions  analysis  has  been  the  first,  and  extensively  used  since  then,   method   for   emotion   recognition   on   multiple   studies,   and   it   is   the  preferred   method   for   single   modal   emotion   recognition   systems.   Facial  expressions   are   the   main   non-­‐verbal   communication   tools,   providing   the  most  powerful,  versatile  and  natural  means  of  communicating  motivational  and   affective   state.   Apart   from   expressing   emotion,   facial   expressions   are  providing   important   communication   communicative   cues   during   social  interaction,  such  as  our  level  of  interest,  our  desire  to  take  a  speaking  turn  and   continuous   feedback   signaling   understanding   of   the   information  conveyed.   Facial   expression   constitutes   55   percent   of   the   effect   of   a  communicated   message   [21]   and   is   hence   a   major   modality   in   human  communication.   Several   studies  have   also   shown   that  ordinary  people   can  detect  six  emotional  facial  expressions  with  an  accuracy  ranging  from  70%  to  98%.  

On   facial   expressions   analysis   systems,   the   face   is   segmented   focusing  on  the  facial  areas  of  eyes,  eyebrows,  mouth  and  nose.  Each  of  these  feature-­‐candidate  areas  contains   the   features  whose  boundaries  are  extracted  and  stored  over  time,  and  then  the  displacement  of  each  feature  is  compared  to  “neutral   face”   model   images   to   conclude   the   emotion   expressed   by   the  subject.  Changes  over  systems  are  usually  on  the  number  of  features  tracked  and  the  kind  of  classifier  used.    

There   are   already   quite   few   systems   for   facial   expression   analysis  developed   by   research   institutes   and   some   are   available   for   research,   or  commercially,   examples   of   such   systems   are:   the   SHORE   system   [22],  developed   by   the   Fraunhofer,   eMotion   [23],   a   project   started   from   the  University   of   Amsterdam,   which   also   includes   software   to   map   captured  facial  expressions  on  second  life  avatars,  MindReader  [24]  developed  initially  by  Cambridge  University  (based  on  the  commercial  system  of  Nevenvision,  now   acquired   by   Google),   projects   of   the   ibug   (intelligent   behaviour  understanding   group)   of   the   Imperial   College  London   [25],   and  FaceAPI   [26]  from  Seeing  machines.  There  are  also  some  open  source  examples  of   facial  features  tracking,  using  openCV  [27]  (open  Computer  Vision)  library  and  the  included  Haar  classifier.  openCV  is  a  library  for  real  time  image  analysis  and  it  has  become  one  of  the  standard  libraries  for  computer  vision,  with  C,  C++,  Python,  and   Java   interfaces   ,  used   in  robotics  and  multimedia  applications,  and   included   in   a   lot   of   frameworks   for   the   development   of   such  applications.    

 

   

 

Page 19: Sensing (E)motions

19    

5.3  Body  movement/postures    

Although  a  lot  has  been  written  for  the  so-­‐called  “body  language”,  body  movement  and  posture  has  not  been  researched  on  emotion  recognition  so  extensively,  as  facial  expressions  and  voice  analysis.  There  are  though  some  studies,   questioning   the   validity   of   facial   expressions   as   a   modality   for  recognizing   affective   states,   because   face   is   involved   in   various   functions  and  many   of   the   famously   recognized   facial   expressions   represent   only   a  small  subset  of  the  possible  expressions,  suggesting  body  posture  as  a  very  good   indicator   for   certain   categories   of   basic   emotions.   Most   studies  however,   have   not   been   able   to   demonstrate   similar   recognition   accuracy  with   that   of   facial   expressions   classifiers,   especially   those   who   study  emotion  recognition  from  static  body  postures  only.  Coulson  [28]  considered  how   6   joint   rotations   (head   bend,   chest   bend,   abdomen   twist,   shoulder  forward/backward,   shoulder   swing,   and   elbow   bend)   could   help  recognizing   6   emotions   (angry,   fear,   happy,   sad,   surprised   and   disgust). Concordance  rates   for  attributions  of   the  6  emotions  ranged   from  zero   for  many   disgust   postures   to   over   90   percent   for   some   anger   and   sadness  postures.   Kleinsmith   A.     and   Bianchi-­‐Berthouze   [29]   used   four   affective  dimensions   (valence,   arousal,   potency,   and   avoidance)   instead   of   discrete  emotion   categories.   On   their   study   there  was   a   12%   error   percentage   for  valence,   10%   for   both   arousal   and   potency,   and   11%   in   the   case   of  avoidance.  On  their  conclusions  they  report  that  other  types  of  body  motion  features,  may  be  necessary  for  achieving  better  recognition  of  some  affective  states   such   as   fear,   and   better   performance   of   their   model. Other   studies  that  include  body  motion  as  a  modality  [30],  tracking  features  like  quantity  of  motion  and  contraction  index  of  the  body,  velocity,  acceleration  and  fluidity  of  the  hand’s  barycenter,  orientation  and  approach/avoidance  behaviors  of  two   participants   towards   their   interlocutor   in   an   interaction,   suggest   that  body   language   reflect   their   level   of   activation   and   dominance   but   are   less  informative  about  their  valence  (positive  vs  negative).  

Another  role  of  body  posture  should  be  also  noted.  Studies  suggest  that  body   posture   can   actually   induce   changes   in   affective   states   or   have   a  feedback   role   affecting   motivation   and   emotion.   A   study   by   Riskind   and  Gotay   [31],   for   example,   revealed   how   “subjects  who   had   been   temporarily  placed  in  a  slumped,  depressed  physical  posture  later  appeared  to  develop  helplessness   more   readily,   as   assessed   by   their   lack   of   persistence   in   a  standard  learned  helplessness  task,  than  did  subjects  who  had  been  placed  in  an  expansive,  upright  posture.”  Furthermore,   it  was  shown  that  posture  had   also   an   effect   on   verbally   reported   self-­‐perceptions.  Another   study   [32]  examining   postures   as   a  modality   for   recognizing   emotions,   suggests   that  involving  the  body  in  the  control  of  technology  facilitates  users’  expression  of   their   feelings,  which   in   turn  makes   them  have   an   improved  experience,  i.e.,  being  engaged.  

An  open  source  library  for  analyzing  body  motion  extracted  from  video  is   the   EyesWeb   [33]   Expressive   Gesture   Analysis   Library.   EyesWeb   refers  both   to   research   projects   of   InfoMus   Lab   of   the   University   of   Genova,   on  

Page 20: Sensing (E)motions

20    

multimodal   interactive   systems   and   expressive   gesture,   and   to   an   open  software   platform   to   support   the   development   of   real-­‐time   multimodal  distributed  interactive  applications.

5.4  Pupil  size    

Studies   have   shown   that   the   eye’s   pupil   is   significantly   larger   during  both  emotionally  negative  and  positive   stimuli   than  during  neutral   stimuli  [34].   Although   we   can   distinguish   valence,   pupil   size   can   be   used   as   an  additional  modality  of  arousal.  A  lot  of  eye  tracker  devices  have  the  ability  to  measure  the  pupil’s  size.

5.5  Bio-­‐sensors    

From  the  range  of  modalities  mentioned   in   the  previous  section,   facial  expression  analysis  has  been  researched  the  most  and  proven  to  be  the  most  accurate.  The  use  of  this  technique  though,  introduces  a  number  of  practical  difficulties  on  some  applications.  During  face  tracking  the  camera  must  have  a   clear   image   of   the   face,   limiting   freedom   of   movement,   requires   good  lighting  conditions,  and  a  rather  static  background  image.  Additionally   it   is  easy   from   someone   not   to   reveal   his   emotions   on   the   camera,   or   as  mentioned  earlier,  autistic  persons  for  example  might  even  have  difficulty  to  do   so   when   they   want   to   express   their   emotions.   For   these   reasons  scientists   have   also   turned   to   the   use   of   embodied   biophysical   sensors,  monitoring   signals   that   can   reveal   valuable   information,   not   only   for   the  physical  state  of  someone,  but  the  emotional  and  mental  as  well.  

The  physiological  signals  usually  monitored  in  behavior  studies  are:  

Heartbeat   rate   (ECG):   Electrocardiography   sensors   determine  heartbeat  rate  by  detecting  and  amplifying  the  tiny  electrical  changes  on  the  skin   that  are  caused  when   the  heart  muscle  depolarizes,  by  measuring   the  difference  in  voltage  between  two  electrodes  placed  either  side  of  the  heart.  There   are   also   optical   heartbeat   sensors   using   an   infrared   LED   and   a  phototransistor,  placed  closed  to  each  other  with  usually  a  fingertip,  or  the  ear  lobe,  in  between.  These  sensors  work  based  on  the  fact  that  when  your  heart  beats  you  have  a  quick  rush  of  blood   into   tiny  blood  vessels  close   to  your  skin,  which  makes  it  less  transparent,  so  less  light  comes  through  it  to  the   phototransistor.   Changes   in   heartbeat   can   give   us   a   clear   index   of  arousal,   but   sensors   are   prone   to  movement   artifacts,   and   it   is   difficult   to  determine  valence.  

Galvanic  Skin  Response   (GSR)/Electro  Dermal  Activity   (EDA)  both  refer   to   the   electrical   changes   measured   at   the   surface   of   the   skin.   EDA  sensors   usually   work   by   passing   a   miniscule   amount   of   direct   current  between  two  electrodes  in  contact  with  the  skin.  When  a  person  experiences  emotional   arousal,   increased   cognitive   workload   or   physical   exertion,   the  brain  sends  signals  to  the  skin  to   increase  the   level  of  sweating.  Sweat   is  a  weak   electrolyte   and   good   conductor,   the   filling   of   sweat   ducts   results   in  

Page 21: Sensing (E)motions

21    

increasing   the   conductance   of   the   applied   current.   Changes   in   skin  conductance  at  the  surface  thus  provide  a  sensitive  and  convenient  measure  of   assessing   sympathetic   arousal   changes   associated   with   emotion,  cognition  and  attention.  

Skin  temperature/Heat  flux  is  the  amount  of  heat  that  the  body  emits.  Studies  have  shown  that  Heat  Flux  is  effective  in  detecting  context  switches.  This   is   because   context   switches   often   involve   physical  movement,   which  causes  the  body  to  warm  up  and  therefore  emit  heat.  

There   are   a   lot   of   companies   today   producing   commercial   wireless,  wearable  biophysical  sensors  transmitting  signals  to  software  running  on  a  smart-­‐phone   or   computer,   for   sports   enthusiasts  who   like   to  monitor   and  keep  track  of  their  exercising  habits.  Most  of  them  do  not  offer  an  open  API  for   application   development   but   in   some   cases   it   is   possible   to   read   the  packets  sent  by  the  sensor  with  custom  libraries.  

 

5.6  Brain  Computer  Interfaces  (BCI)    

   Brain   computer   interfaces   are   sensors   monitoring   brain   activity   to  translate  user’s  thoughts  or  mental  state  into  actions  on  the  computer.    The  brain’s   electrical   charge   is  maintained  by  billions   of   neurons.  Neurons   are  electrically  charged  by  membrane  transport  proteins  that  pump  ions  across  their   membranes.   Neurons   are   constantly   exchanging   ions   with   the  extracellular  milieu,  for  example  to  propagate  action  potentials.  

     Electroencephalography  (EEG)  is  the  recording  of  electrical  activity,  using   electrodes   attached   along   the   scalp,   measuring   voltage   fluctuations  resulting   from   ionic   current   flows   within   neurons,   and   generated   by   the  synchronous   activity   of   thousands   or   millions   of   neurons   with   similar  spatial  orientation  in  the  brain.  

     Since   its   discovery   in   1924   by   Hans   Berger,   EEG   has   been   widely  used   in   clinical   research,   on   neurology,   to   diagnose   epilepsy,   coma,   brain  death  and  various  encephalopathies.  Scalp  EEG  activity  shows  oscillations  at  a  variety  of  frequencies  and  researchers  have  associated  certain  oscillations  frequency   ranges   and   spatial   distributions   to   different   states   of   brain  functioning.         Although   EEG   is   not   the  most   accurate  method   to  monitor  brain  activity,  its  ease  of  use,  portability  and  low  set-­‐up  cost  has  made  it  the  most  studied  one,  and  resulted  its  application  to  other  research  fields  and  all  kinds  of  experiments  where   it   is   interesting  to  monitor  the  mental  state  of  the  subject.  Usually  three  frequency  ranges  are  used  for  this  purpose:  

 

 

 

Page 22: Sensing (E)motions

22    

 

• Theta  (4  -­‐  7  Hz):  related  to  drowsiness    

• Alpha  (8  -­‐  13  Hz):  related  to  relaxation    

• Beta  (>13  –  30  Hz):  related  to  alertness  

 

 

 During   the   last   years,   EEG   has   made   its   way   to   human   computer  interaction   research,   research   towards   machines   with   emotional  intelligence,   and   a   small   number  of   companies   are  working  on  developing  low   cost,   non-­‐invasive,   brain   computer   interface   products   like   the   Emotiv  headset  ,  Neurosky’s  Mindwave,  Starlab’s  Enobio  (which  combines  EEG,  ECG  and  EOG  sensors)  ,  and  OpenEEG  [35],  a  community  project  has  been  created  to   support   the   creation   of   open   hardware   and   software   solutions.   On   a  consumer   level,   these   interfaces   are   currently   used  mainly   in   gaming   and  other  entertainment  applications,  since  they  are  still  proved  to  be  inaccurate  and  not  practical  for  more  critical  applications.  

 

Functional   near-­infrared   spectroscopy   (fNIRS)   is   an   emerging  technique  for  sensing  brain  activity,  similar  to  the  technique  used  by  optical  heartbeat  sensors  mentioned  earlier  in  the  document.  The  fNIRS    system  is  made  up  of  probes  that  send   light  at   two  wavelengths   in   the  near-­‐infrared  range.   Biological   tissues   are   relatively   transparent   to   light   at   these  wavelengths.   The  main   absorbers   of   the   light   are   oxygenated   hemoglobin  and   deoxygenated   hemoglobin.   These   act   as   relevant   markers   of  hemodynamic  and  metabolic  changes  associated  with  neural  activity  in  the  brain.  The  reflected   light   is   then  picked  up  by   the  detectors  on   the  device.  Depending  on  the  amount  of  light  that  is  reflected,  we  can  get  a  measure  of  brain  activity  in  the  area  beneath  the  sensors.  

Studies   in   fNIRS   [36]   report   that   the   hemodynamic   response   being  measured   in   brain   is   a   slow   response  which   occur   over   5-­‐8   seconds.   This  makes   the   technique  currently   impractical   to  be  used   for   interaction   input  interfaces.   For   the   moment   there   is   still   no   commercial   brain   computer  interface  utilizing  the  fNIRS  technique.  

 

 

 

Page 23: Sensing (E)motions

23    

 

5.7  Developing  Tools  for  Emotional  Intelligence    

As  mentioned  on  the  introduction  of  this  chapter,  emotion  recognition  is  a  difficult  task  for  a  computer  and  the  performance  of  such  systems  can  vary  depending  on   the   state  of   the   interacting  person  as  well   as   environmental  conditions.   In  order   to   increase   the   reliability  of   emotion  sensing   systems,  and  after   gaining  experience  by  developing   single  modal   analysis   systems,  modern   research   examines   the   application   of   multi-­‐modal   systems   [37][38],  combining   various   sensors   and   data   analysis   and   sharing   a   last   decision  level   to  determine   the  emotional  or  effective   state  of   the  subject.  Towards  this  direction  there  have  been  a  number  of  projects,  with  contribution  from  universities   all   over   Europe,   for   the   development   of   frameworks   and  middleware   that   make   easier   for   researchers   to   develop   and   use   multi-­‐modal  emotion  recognition  systems.  

CALLAS   [39]   (Conveying   Affectiveness   in   Leading-­‐edge   Living   Adaptive  System)   is   a   project   funded   by   the   European   Commission,   under   the   6th  Framework   Programme,   with   the   participation   of   a   lot   of   universities  around   Europe.   CALLAS   is   a   framework   based   on   a   plug-­‐in   multimodal  architecture,   containing   a   collection   of   components   for   feature   extraction  from  text,  audio,  video  and  motion  sensors,  and  process  emotional  aspects  in  real-­‐time  for  easy  development  of  applications  for  art  and  entertainment.  The   CALLAS   framework   also   includes   its   own   visual   programming,  authoring  tool,  CAT.  

SEMAINE   [40]   is   also   a   project   funded   by   the   European   Commission,  under   the  7th  Framework  Programme,  aiming  to  build  a  Sensitive  Artificial  Listener,   a   multimodal   dialogue   system   which   can   sustain   an   interaction  with   a   user   for   some   time   and   react   appropriately   to   user’s   non-­‐verbal  behavior.   The   system   can   take   input   from   video   and   audio   to   analyze   the  user’s   emotional   state.   The   SEMAINE   API   is   available   as   open   source,  supporting  C++  and   Java;   it   features   the  Apache  ActiveMQ  message  broker  as  an  integration  layer  and  can  run  as  a  distributed  system.    

 

SSI   [41]   (Social   Signal   Interpretation)   framework   is   developed   by   the  Human   Centered   Multimedia   research   laboratory   of   the   University   of  Augsburg.  It  is  available  as  open  source,  written  in  C++,  and  contains  tools  to  

Page 24: Sensing (E)motions

24    

record,   analyze   and   recognize   human   behavior   in   real-­‐time,   such   as  gestures,  mimics,  head  nods,  and  emotional  speech.  It  also  follows  a  plug-­‐in  based  design,  with  a  growing  collection  including  among  others,  input  from  the   Wii-­‐mote   and   the   Kinect   sensor   (under   development),   while   it   also  supports   the   use   of   external   libraries   such   as  OpenCV,   ARToolKit,   SHORE,  Torch,  Speex,  Watson.  SSI  supports  the  machine-­‐learning  pipeline  in  its  full  length   and   offers   a   graphical   interface   that   assists   a   user   to   collect   own  training   corpora   and  obtain  personalized  models.   It   also   features   an  XML-­‐editor  programming  environment  to  draft  and  run  pipelines  without  special  programming  skills.  

Apart   from  developing   special   software,   a   lot   of  projects  have   focused  on  creating  standard  formats  to  represent  human  emotions  and  share  them  along  emotion  aware  applications.  These  formats  can  be  used  for  example  to  annotate  digital  media  in  order  to  train  models  for  affective  indexing,  collect  data   to   train  virtual   agents,   or   to   share  data  between  emotion   recognition  system  and  an  application,  developed  by  another  party,  that  will  animate  a  virtual  avatar  of  the  user  accordingly.    

MPEG-­‐4   (Part   2   “Visual”)   contains  MPEG-­4   FAP   [42]   (Facial   animation  parameters),  a  set  of  68  parameters  to  allow  the  animation  of  synthetic  face  models,   which   can   be   used   on   facial   expressions   analysis   applications.  MPEG-­V   [43]   is   a   standard   under   development   for   a   common  middle   layer  format  for  interaction  and  visualization,  among  virtual  world  applications.  

EMMA   [44]   (Extensible   Multimodal   Annotation   Language),   is   an   XML  markup  language,  recommended  by  the  W3C,  for  containing  and  annotating  the   interpretation   of   user   input.   It   is   a  wrapper   language   that   can   include  various  kind  of  payloads  representing   interpretation  of  various  user   input.  An   interpretation   element   contains   information   about   the   modality   upon  which  the  interpretation  is  based,  can  indicate  start  and  end  timestamps  of  the  interpretation  and  many  more  attributes.  EmotionML  [45],  is  a  “plug-­‐in”  language,  also  recommended  by  W3C,  which  can  be  combined  with  EMMA,  to  represent  human  emotions  on  user  input.  EmotionML  recognizes  the  fact  that   there   is   no   single   agreed   representation   of   affective   states,   or   of  vocabularies   to   use.   Therefore,   an   emotional   state   <emotion>   can   be  characterized   using   four   types   of   descriptions:   <category>,   <dimensions>,  <appraisals>,   and   <action-­‐tendencies>.   An   example   of   EMMA   document  carrying  EmotionML  as  interpretation  payload  is  given  bellow:  

<emma:emma xmlns:emma="http://www.w3.org/2003/04/emma" version="1.0"> <emma:interpretation emma:start="123456789"> <emotion xmlns="http://www.w3.org/2005/Incubator/emotion"> <dimensions set="valenceArousalPotency"> <arousal value="-0.29"/> <valence value="-0.22"/> </dimensions> </emotion>

</emma:interpretation>

</emma:emma>

 

Page 25: Sensing (E)motions

25    

HEO    [46]  (Human  Emotion  Ontology)  is  an  effort  to  make  an  RDF,  OWL  ontology   to   represent   human   emotions  with   sub   classes   and   attributes   to  describe  input  modalities,  dimensions  (arousal,  valence,  dominance),  action  tendencies  and  many  more.  

SAIBA   [47]   (Situation,   Agent,   Intention,   Behavior,   Animation)   is   a  running   project   focusing   on   the   creation   of   a   framework   of   languages   for  Embodied   Conversational   Agents,   with   three   stages   representing   intent  planning,   behavior   planning   and   behavior   realization.   A   Function  Markup  Language   (FML),   describing   intent  without   referring   to   physical   behavior,  mediates   between   the   first   two   stages   and   a   Behavior   Markup   Language  (BML)   describing   desired   physical   realization,   mediates   between   the   last  two  stages.  BML  has  behavior  elements  for  head,  torso,  face,  gaze,  body,  legs,  gesture,  speech  and  lips  and  defines  attributes  for  animating,   lips  and  gaze  synchronization,  gestures  etc.  

More   information,   articles   and   tools   can   be   found   on   the   HUMAINE  Association   website   [48],   an   international   community   around   research   on  emotions  and  human-­‐machine  interaction.  

 

6.  Sensor  Hardware  Platforms    

There   is   a   very   large   number   of   companies   producing   sensors   and  offering   specialized   solutions   for   any   nature   of   project.   As   final   products,  designed   for   a   specific   use   though,   these   solutions   often   introduce  restrictions  on  their  application  to  custom  projects,  and  collaborating  with  custom   written   software.   The   architectural   design   of   a   project   featuring  multiple   sensors,   requires   not   only   a   sensor   network   that  will  make   sure  that   all   sensors  work   together  without   problems,   but   also   a   network   that  can   be   customized   to   fit   the   project’s   data   flow   design.   The   use   of   sensor  platforms   complies   with   these   two   requirements   offering   a   common  standard  base  between  sensors  and  the  freedom  to  customize  their  function  and   connectivity.   The   following   part   presents   some   examples   of   sensor  platforms  used  today,  with  different  design  approaches.  

6.1  Arduino    

Arduino    is  an  open-­‐source  electronics  platform.  It  is  designed  as  a  low  cost,   expandable,   multi-­‐purpose   prototyping   platform   based   on   flexible,  easy   to   use   hardware   and   software.   Since   its   introduction,   Arduino   has  created   a   very   large   community,   sharing   support   and   code;   it   is   used   for  education   in   a   lot   of   laboratories   around   the   world,   and   has   become   a  standard  for  interactive  designers,  media  artists,  and  hobbyists.  

Page 26: Sensing (E)motions

26    

The   Arduino   basic   platform   consists   of   three   parts:   The   Arduino  microcontroller,  which  can  be  built  by  hand  using  the  provided  schematics  or   purchased   preassembled   and   in   different   versions   and   sizes,   including  versions  designed  to  implement  wireless  nodes,  with  XBee*  radio  connector  and  circuitry  for  battery  and  charging;  or  version  like  the  LilyPad,  designed  so   it   can   be   sewn   onto   fabric,   for   wearable   applications.   The   Arduino  microcontrollers   are   based   on   the   Atmel   8-­‐bit   AVR   family   of  microcontrollers  with  RISC  architecture.  

Second   part   of   the   platform   is   the   language   and   compiler.   Arduino’s  language   is   based   on   C,   and   designed   to   simplify   the   creation   of   physical  interaction   application,   in   combination  with   the   use   of   the   third   part,   the  IDE,  which  is  built  on  Java.    The  three  parts  make  a  platform  with  simplified  programming   language,   used   to   create   instructions   for   a   controller   basic  enough   to  be  easily  used   for   common  programming   tasks,   yet  powerful   to  support  complex  projects.  

Arduino  can  be  expanded  with  a  great  variety  of  add-­‐ons,   the  Arduino  shields  as  they  are  called,  and  a  great  variety  of  motion  and  environmental  sensors,   network   devices,   servomotors,   and   can   implement   wireless  sensors,  tangible  interfaces  and  robots.  

*XBee   is   a   ZigBee-­‐enabled   device   for   Arduino.   ZigBee   is   a   wireless  communication   standard,   designed   to   be   inexpensive,   with   low-­‐power  consumption.   Most   importantly   ZigBee   is   particularly   well   designed   for  mesh  networks,  which  connect  from  node  to  node,  instead  of  a  single  router  network.    

6.2  .Net  Gadgeteer    

Following  Arduino’s  success,  Microsoft  Research  recently   launched  the  .Net  Gadgeteer  open  source  platform  ,  a  microcontroller  based  on  the  ARM7  processor,  designed  to  be  programmed  through  the  Microsoft’s  .NET  (Micro)  Framework   and   C#   and   expand   through   solder-­‐less   connection   modules.  The   idea   of   solder-­‐less   connection   modules   will   encourage   more   people  without   any   experience   in   building   circuits,   to   try   and   build   their   own  gadget  prototypes.  Since  Gadgeteer  is  a  very  new  platform,  and  since  it  uses  its  own  connection  standard,  the  list  of  available  modules  is  still  limited  on  sensors.  

6.3  Phidgets    

Phidgets   is   also   a   platform   on   the   concept   of   Arduino,   designed   to   be  even  simpler.  Phidgets  is  a  line  of  plug  and  play  building  blocks  for  physical  computing  that  can  be  connected  over  USB  to  a  computer  and  communicate  with  any  application.  The  Phidgets  API  controls  all  the  USB  communication  with  the  devices,  making  simpler  the  communication  between  applications.  Arduino   supports   the   creation   of   more   complex   projects,   but   Phidgets  allows  you  to  built  simpler  prototypes  faster,  and  supports  programming  in  

Page 27: Sensing (E)motions

27    

a  large  variety  of  programming  languages  including  high  level  languages  like  C#   and   Actionscript   3,   as   well   as   visual   programming   frameworks   like  Max/PureData  and  LabView  (see  Ch.7).  

6.4  Shimmer    

Shimmer   is   an   open   source   platform   for   small,   wearable,   wireless  sensors.  Shimmer  started  as  a  project  of  Intel  Research  and  is  now  a  division  of   Realtime   Technologies.   Unlike   the   previous   hardware   platforms  presented,  focusing  on  multi  purpose  prototype  building,  shimmer  produces  already  assembled,  highly  sophisticated  sensors,  focusing  more  on  research  around  Body(or  Personal)  Area  Networks  (B/P  AN).  BAN  research  aims  at  the   development   of   wireless   distributed   systems   for   autonomous   and  remote  monitoring  of  patients  in  health  care.      

 Shimmer  platform  consists  of  the  main  unit,  a  light  weight  pack  with  an  MSP430  processor,  battery,  Bluetooth  and  802.15.4  connectivity,  a  micro  SD  memory  slot   for  offline  data   storage,   a   tilt   sensor  and  an  accelerometer.  A  variety  of  motion,  biophysical  and  ambient   sensors   can  be  connected  with  the  unit.  The  firmware  of  the  unit  embeds  TinyOS  [53],  a  very  light  and  highly  customizable  unix-­‐like  operating  system,   specially  designed   for   low  power  embedded  systems  and  sensor  networks.  Shimmer  supports  development  of  applications   in   C#   and   also   a   LabView   library,   although   every   unit   is   an  autonomous   node,   providing   data   in   raw   or   semi   processed   format,  accessible  through  all  applications  via  custom  libraries.  

6.5  I-­‐CubeX    

I-­‐CubeX     is   a   commercial   platform  producing   a   variety   of   sensors   and  providing  multiple  sensor  kits  for  research  and  interactive  projects.  I-­‐CubeX  provides  an  API  with   support   for  various   languages   like  C++,  Actionscript,  Max/Jitter,   while   the   sensors   can   communicate   directly   with   musical  keyboard   instruments   using   the  MIDI   interface.   On   the   platforms  website  there  are  a  lot  of  application  code  examples  and  sensor  kits  suggested  for  a  wide  range  of  interactive  applications  categories.  

 

 

 

   

Page 28: Sensing (E)motions

28    

 

7.  Interactive  Software  Development  Frameworks    

The  last  part  of  this  document  is  a  short  presentation  of  various  useful  frameworks  and  toolkits  for  interactive  application  programming.  Although  a  lot  of  the  frameworks,  mentioned  bellow,  share  a  lot  of  common  elements,  this   list   serves   two   purposes.   The   first   is   to   cover   frameworks  written   in  different   languages   so   that   the   reader   can   find   one   that   is   written   in   a  familiar   language   for   him,   or   that   serves   better   his   projects   requirements.  The   second   purpose   is   to   encourage   the   reader   to   visit   and   explore   the  websites   of   the   tools   mentioned,   where   previous   work   of   very   talented  programmers   and   artists   is   showcased,   often   with   available   source   code,  being  thus  a  great  source  of  inspiration  for  anyone  interested  in  multimedia  programming  and  visual  arts.  

Processing  (Java  based)  is  an  open  source  programming  language  and  environment   focusing   graphics   and   interactions   programming.   Based   on   a  very   minimal   environment,   Processing   was   developed   as   a   “software  sketchbook”   and   a   tool   to   teach   fundamental   computer   programming   for  visual  arts.  Processing  was  the  first  of  a  series  of  frameworks  that  appeared  during  the  recent  years,  wrapping  a  growing  collection  of  standard  libraries  for   graphics,   image,   video,   audio   manipulation,   network   libraries,   physics  engines  and  many  more,  offering  also  more  simplified  interfaces  to  all  these  libraries  to  make  it  simple  to  combine  them  inside  a  program.  

After  the  success  of  Processing,  openFrameworks  (C++)  was  released,  following   the   same   concept,   using   C++   to   deliver   applications  with   better  performance   than   processing   and   native   C++   libraries,   offering   also   the  ability   to   develop   native   applications   for   the   iOS   and   Android   mobile  platforms.  openFrameworks  has  built  a  very  large  support  community  and  it  has   been   successfully   used   from   mobile   apps   to   large   and   complex  interactive   installations.   Beyond   the   basic   standard   libraries   wrapped   by  openFrameworks,  users  are  constantly  expanding  the  list  of  add-­‐on  libraries  and   components,   including   libraries   to   create   tangible   interfaces   and  physical  interaction,  like  the  TUIO  and  TouchLib  libraries,  and  the  OpenNI  framework,   which   has   already   produced   a   few   very   interesting   projects  using  the  Kinect  sensor.  Cinder  (C++)  and  Polycode  (C++/Lua)  are  also  two  other  open  source  toolkits  similar  to  openFrameworks.  

Visual  Programming  Languages    

Visual   programming   languages   combine   traditional   coding   with   tools  that   allow   the   user   to   handle   all   components   as   blocks   on   a   canvas.   Each  block   has   some   kind   of   input   signal,   and   the   code   inside   the   block  determines  its  output.  In  that  way  the  user  control  the  flow  of  data  inside  a  program  by  virtually  wiring  signals  with  blocks   input/outputs.  Apart   from  

Page 29: Sensing (E)motions

29    

offering  a  clearer  structure,  by  using  this  visual  schematic,  to  people  with  no  programming   background,   visual   programming   languages   also   focus  more  on   live,   or   run-­‐time   coding,   allowing   to   change   the   behavior   of   a   block  without  requiring  to  recompile  of  the  whole  program.  

The  most  popular  visual  programming  languages  are  Max,  developed  by  Cycling74,   and   PureData,   its   free   open   source   equivalent,   actually  developed  by  one  of  the  initial  developers  of  Max,  Miller  Puckette.  Max  and  PureData  were  particularly  popular  to  musicians,  since  electronic  music  was  one  of  the  first  fields  utilizing  digital  technology  and  programming  and  this  logic  of  dataflow  programming,  wiring  different  signals,  effects  and  sensors  was   something   that  musicians   were   already   familiar   with   from   recording  studios.   Today   both   tools   have   a   very   large   collection   of   patches   and  programming  APIs  to  integrate  different  effects  and  sensors.  

 Isadora,  developed  by  TroikaTronix,  software  branch  of  Troika  Ranch,  a   media   intensive   dance   company,   is   a   visual   programming   language  focusing  mainly   in  manipulation  of   video  and  audio   for   live  performances,  supporting  up  to  6  different  independent  outputs,  and  including  also  a  C++  SDK  to  develop  custom  filters  and  effects.  

Field   is   a   Python   based   open   source   toolkit,   developed   by  OpenEndedGroup,   a   team   of   artists   also   with   experience   in   interactive  installations,   and   working   on   theatre   and   dance   performances.   Field  includes   a   Processing   plug-­‐in   which   replaces   the   Processing   IDE,   and  through   which   all   Processing   libraries   can   be   used   on   Field.   A   program  written  in  Field  can  also  include  also  code  in  other  programming  languages,  including   languages   that   execute   inside   other   applications   like   Autodesk  Maya  and  Adobe  After  Effects.  Field  supports  only  Mac  and  Linux  platforms  

VVVV   is   another   new   visual   programming   toolkit,   free   for   non-­‐commercial   use,   compatible   with   Windows   platform   only,   using   DirectX  libraries  and  supporting  programming  in  C#.  

QuartzComposer   is   part   of   Apple’s   XCode   framework,   for   visual  programming  using  native  libraries  of  the  MacOS.  

Working  with  sensors    

For   working   more   particular   with   sensors,   signal   processing   and  pattern  recognition,  the  most  popular  applications,  offering  both  visual  and  traditional   programming   are   LabView,   by   National   Instruments,   and  Simulink,  developed  by  MathWorks.    

BioMOBIUS   is   an   open   platform,   developed   by   an   open   community   of  researchers  and  by  TRIL  Centre,  which  allows  researcher  to  rapidly  develop  sophisticated   technology   solutions   for   biomedical   research.   It   was  developed  with  the  philosophy  of  providing  a  common  technology  platform,  which   comprises   hardware,   software,   services   and   sensors.   BioMOBIUS  

Page 30: Sensing (E)motions

30    

development   environment   is   based   on   EyesWeb,   and   provide   support   for  designing  applications  based  on  the  Shimmer  sensor  platform.    

Exemplar     is  an  open  source  kit   for  programming  of  prototypes  using  sensors,   developed   by   Stanford’s  University,  Human  Computer   Interaction  Group.  Exemplar  is  a  plug-­‐in  written  for  Eclipse  IDE,  offering  a  GUI  through  which  is  possible  visually  monitor  live  sensors  signals  and  manipulate  them.  

   ROS   (Robot   Operating   System)     is   an   open   source   project   providing  libraries   and   tools   like   device   drivers,   message   passing   middleware,  computer   vision   libraries,   and   more   features   to   support   the   creation   of  robot  applications.  Since  robots  are  an  ensemble  of  sensors  and  motors,  ROS  features  could  also  support    the  creation  of  a  project  utilizing  a  network  of  autonomous  sensor  nodes.  Among  other  sensors,  ROS  now  includes  drivers  and  libraries  for  the  Kinect  sensor,  which  is  a  perfect  solution  for  computer  vision   in   low   cost   robot   projects,   and   has   already   been   used   with   very  interesting  results.    

Result   of   the   combination   of   ROS  with   the   Kinect   sensor   is   also   the  Point  Cloud  Library  (PCL),  sister  project  of  ROS,  including  state  of  the  art  algorithms   for   3D   point   cloud   processing,   including   filtering,   feature  estimation,   surface   reconstruction   and   registration,   model   fitting   and  segmentation.  

 

 

 

 

 

 

 

 

 

 

 

 

   

Page 31: Sensing (E)motions

Bibliography    

Joshua   Noble   (2009).   Programming  Interactivity.   Sebastopol   (U.S.A.):   O’Reilly  Media  

Dan  O’Sullivan  and  Tom  Igoe  (2004).  Physical  Computing.   Boston   (U.S.A.):   Thomson   Course  Technology  

References    

[1]:   M.   C.   Johnson-­‐Glenberg,   D.   Birchfield,   P.  Savvides,   C.   Megowan-­‐Romanowicz.   In:   L.  Annetta   &   S.   Bronack   (eds.)   Serious  Educational   Game   Assessment:   Practical  Methods   and   Models   for   Educational   Games,  Simulations   and   Virtual  Worlds.   pp.   225-­‐241.  Sense  Publications,  Rotterdam.  2010  

[2]:Ramesh   Raskar,   Hideaki   Nii,   Bert  deDecker,  Yuki  Hashimoto,  Jay  Summet,  Dylan  Moore,   Yong   Zhao,   Jonathan   Westhues,   Paul  Dietz,   John   Barnwell,   Shree   Nayar,   Masahiko  Inami,  Philippe  Bekaert,  Michael  Noland,  Vlad  Branzoi,   and   Erich   Bruns.   2007.   Prakash:  lighting   aware   motion   capture   using  photosensing   markers   and   multiplexed  illuminators.   In  ACM   SIGGRAPH   2007  papers  (SIGGRAPH   '07).   ACM,   New   York,   NY,  USA,  ,  Article  36  .  

[3]:  Takaaki  Shiratori,  Hyun  Soo  Park,  Leonid  Sigal,  Yaser  Sheikh,  Jessica  K.  Hodgins  "Motion  Capture  from  Body-­‐Mounted  Cameras"  ACM  Transactions  on  Graphics,  Vol.  30,  No.  4  (Proc.  ACM  SIGGRAPH  2011),  July  2011

[4]: A. Laurentini (February 1994). "The visual hull concept for silhouette-based image understanding". IEEE Trans. Pattern Analysis and Machine Intelligence.. pp. 150–162  

[5]:  Corazza  S.,  Mündermann  L.,  Andriacchi  T.,  A  Framework  For  The  Functional  Identification  Of  Joint  Centers  Using  Markerless  Motion  Capture,  Validation  For  The  Hip  Joint,  Journal  of  Biomechanics,  2007.  

[6]:   L.   Xinghan,   B.   Berendsen,  R.T.   Tan,  R.C.  Veltkamp,  Dept.  of  Inf.  &  Comput.  Sci.,  Utrecht  Univ.,   Utrecht,   Netherlands.   Human   Pose  Estimation   for   Multiple   Persons   Based   on  Volume   Reconstruction.   In:   Proc.   2010   20th  ICRP.  IEEE,  2010,  pp  3591-­‐3594.    

 

[7]:  Rosenhahn,  B.,  Brox,  T.,  Kersting,  U.  G.,  Smith,  A.  W.,  Gurney,  J.  K.,  &  Klette,  R.  (2006).  A  system  for  marker-­‐less  motion  capture.  Main,  1(1),  45-­‐51.  Citeseer.  

[8]:  http://www.eyewriter.org  

[9]:  J.  Shotton,  A.  Fitzgibbon,  M.  Cook,T.  Sharp,  M.   Finocchio,   R.   Moore,A.   Kipman,   A.   Blake.  Real-­‐Time   Human   Pose   Recognition   in   Parts  from   a   Single   Depth   Image.   Microsoft  Research  Cambridge,  2011.  

[10]:   R.   W.   Picard.   Toward   Machines   with  Emotional   Intelligence.   In:   IEEE  Transactions  on  Pattern  Analysis  and  Machine  Intelligence  -­‐  Graph   Algorithms   and   Computer   Vision  Journal,   Vol.   23,   10,   IEEE   Computer   Society,  2001,  pp.  1175-­‐1191.    [11]: O.A.   Schipor,   Ş.G.   Pentiuc,   M.D.   Schipor.  Towards   a   multimodal   emotion   recognition  framework   to   be   integrated   in   a   computer  based   speech   therapy   system.   In:   The   6th  International   Conference   on   Speech  Technology   and   Human-­‐Computer   Dialogue,  2011.    [12]:   Rosalind   W.   Picard.Future   affective  technology   for   autism   and   emotion  communication   Phil.   Trans.   R.   Soc.  B  December  12,2009.   [13]:  IRIS  project.  Integrate  Research  on  Interactive  Storytelling.  http://iris.scm.tees.ac.uk/    

[14]:   Lennart E. Nacke. Directions in Physiological Game Evaluation and Interaction. In CHI 2011 BBI Workshop Proceedings, Vancouver, BC, Canada. 2011

[15]:  Ekman,  P,  &  Friesen,  W.  V.  (1978).  The  facial  action  coding  system:  A  technique  for  the  measurement  of  facial  movement.  Palo  Alto:  Consulting  Psychologists  Press.

[16]:   G.   Castellano,   L.   Kessous,   G.   Caridakis.  Emotion Recognition through Multiple Modalities: Face, Body Gesture, Speech. In : Affect and Emotion in Human-Computer Interaction, Springer Berlin / Heidelberg, 2008. Pp 92-103    [17]:Rosalind  W.  Picard.Future  affective  technology  for  autism  and  emotion  communication  Phil.  Trans.  R.  Soc.  B  December  12,2009.  

[18]:  A.  Batliner,  D.  Seppi,  S.  Steidl,  B.  Schuller.  Segmenting   into   Adequate   Units   for  Automatic   Recognition   of   Emotion-­‐Related  Episodes:   A   Speech-­‐Based   Approach.   In  :Advances   in   Human-­‐Computer   Interaction  Volume  2010  (2010)

Page 32: Sensing (E)motions

32    [19]  T.  Vogt,  E.  André  and  N.  Bee,  "EmoVoice  -­‐  A   framework   for   online   recognition   of  emotions   from   voice,"   in  Proceedings   of  Workshop   on   Perception   and   Interactive  Technologies  for  Speech-­Based  Systems,  2008.

[20]   F.  Eyben,   M.  Wöllmer,   and   B.  Schuller.  openEAR   -­‐   Introducing   the   Munich   Open-­‐Source   Emotion   and   Affect   Recognition  Toolkit.   In:Proc.   4th   International   HUMAINE  Association  Conference  on  Affective  Computing  and   Intelligent   Interaction   2009   (ACII   2009),  Amsterdam,   The   Netherlands,   volume  I,   pp.  576–581.  IEEE,  2009.  10.-­‐12.09.2009.  

[21]: A. Mehrabian. Communication without words. Psychology Today, 2(4):53–56, 1968.  

[22]: Fraunhofer Institute. Germany http://www.iis.fraunhofer.de/en/bf/bsy/produkte/shore/

[23]:Salah, A.A., N. Sebe, Th. Gevers, Communication and automatic interpretation of affect from facial expressions, in D. Gökçay & G. Yıldırım (eds.), Affective Computing and Interaction: Psychological, Cognitive and Neuroscientific Perspectives, to appear.

[24]: Rana   el   Kaliouby   and   Peter  Robinson.  Real-­‐Time  Inference   of   Complex  Mental   States   from   Facial   Expressions   and  Head   Gestures.   In   the   IEEE   International  Workshop  on  Real  Time  Computer  Vision   for  Human   Computer   Interaction   at  el  Kaliouby,  CVPR,  2004.  

[25]:Intelligent Behaviour Understanding Group (iBUG), Department of Computing, Imperial College London http://ibug.doc.ic.ac.uk/resources/facial-tracker-2011/

[26]: Seeing Macinnes. FaceAPI http://www.seeingmachines.com/product/faceapi/

[27] open Computer Vision Library. http://opencv.org

[28] Coulson, M. (2004) 'Attributing Emotion To Static Body Postures: Recognition Accuracy, Confusions, And Viewpoint Dependence.' Journal of Nonverbal Behavior 28 (2) 117-139

[29] Kleinsmith   A.,   and   Bianchi-­‐Berthouze   N.,  Recognizing   affective   dimensions   from   body  posture,   In:  Proc.  2nd   Intl  Conf  of  ACII,   LNCS  4738,  Portugal,  pp.  48-­‐58,  2007

[30] A. Metallinou , A. Katsamanis, Wang Yun, S.Narayanan. Tracking changes in continuous emotion states using body language and prosodic cues. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE. Prague. 2011. pp 2288-2291

[31] Riskind, J.H., and Gotay, C.C.: Physical posture: Could it have regulatory or feedback effects on motivation and emotion? Motivation and Emotion 6(3) (1982).pp 273–298

[32] N.   Bianchi-­‐Berthouze,   P.,   Cairns,   A.,   Cox,  C.,   Jennett,  W..,  Kim,.On  posture  as  a  modality  for   expressing   and   recognizing   emotions.  Emotion   and   HCI   workshop   at   BCS   HCI  London,  September,  2006    

[33] A.   Camurri,   B.   Mazzarino,   G.   Volpe.  Analysis  of  Expressive  Gesture:  The  EyesWeb  Expressive   Gesture   Processing   Library   In   :  GESTURE-­‐BASED   COMMUNICATION   IN  HUMAN-­‐COMPUTER   INTERACTION   Lecture  Notes   in   Computer   Science,   2004,   Volume  2915/2004,  469-­‐470  

[34]  Timo  Partala  and  Veikko  Surakka.  2003.  Pupil  size  variation  as  an  indication  of  affective  processing.  Int.  J.  Hum.-­Comput.  Stud.  59,  1-­‐2  (July  2003),  185-­‐198.  

[35]  OpenEEG.  http://openeeg.sourceforge.net/doc/  

[36]   Erin   Treacy   Solovey,   Audrey   Girouard,  Krysta   Chauncey,   Leanne   M.   Hirshfield,  Angelo   Sassaroli,   Feng   Zheng,   Sergio   Fantini,  and  Robert  J.K.  Jacob.  2009.  Using  fNIRS  brain  sensing   in   realistic  HCI   settings:   experiments  and   guidelines.   In  Proceedings   of   the   22nd  annual   ACM   symposium   on   User   interface  software  and  technology  (UIST  '09).  ACM,  New  York,  NY,  USA,  157-­‐166.  

[37]  O.  A.  Schipor,  S.  G.  Pentiuc,  M.  D.  Schipor.  Towards   a   multimodal   emotion   recognition  framework   to   be   integrated   in   a   Computer  Based   Speech   Therapy   System.   In:   6th  Conference   on   Speech   Technology   and  Human-­‐Computer   Dialogue   (SpeD),  IEEE.Brasov.Romania.2011.  pp  1-­‐6.  

[38]   Eija   Haapalainen,   SeungJun   Kim,   Jodi   F.  Forlizzi,   and   Anind   K.   Dey.   2010.   Psycho-­‐physiological   measures   for   assessing  cognitive  load.  In  Proceedings  of  the  12th  ACM  international   conference   on   Ubiquitous  computing  (Ubicomp   '10).   ACM,   New   York,  NY,  USA,  301-­‐310.  

[39] Bertoncini,  M.  and  Cavazza,  M.,  2007.  Emotional  Multimodal  Interfaces  for  Digital  Media:  The  CALLAS  Challenge.  Proceedings  of  HCI  International  2007.

[40]   Marc   Schro   ̈der.   The   SEMAINE   API:  Towards   a   Standards-­‐Based   Framework   for  Building   Emotion-­‐Oriented   Systems.   In:  Advances   in   Human-­‐Computer   Interaction.  

Page 33: Sensing (E)motions

33    Volume   2010   (2010),   Article   ID   319406,   21  pages  

[41]   J.  Wagner,   F.   Lingenfelser,   and  E.  Andre,  The   Social   Signal   Interpretation   Framework  (SSI)   for   Real   Time   Signal   Processing   and  Recognitions,"  in   Proceedings   of  INTERSPEECH  2011,  Florence,  Italy,  2011.  

[42] F.   Lavagetto   and   R.   Pockaj,   "The   Facial  Animation   Engine:   towards   a   high-­‐level  interface   for   the  design  of  MPEG-­‐4  compliant  animated   faces",   IEEE   Trans.   on   Circuits   and  Systems   for   Video   Technology,   Vol.   9,   n.2,  March  1999,  pp.277-­‐289

[43]  MPEG-­‐V  (Information  Exchange  with  Virtual  Worlds)  http://mpeg.chiariglione.org/working_documents.htm#MPEG-­‐V  

http://www.metaverse1.org/  

[44] EMMA: Extensible MultiModal Annotation markup language W3C Recommendation 10 February 2009 http://www.w3.org/TR/emma/

[45]  Emotion  Markup  Language  (EmotionML)  1.0.  W3C  Working  Draft  7  April  2011  http://www.w3.org/TR/emotionml/  

[46] : Marco   Grassi.   2009.   Developing   HEO  human   emotions   ontology.   In  Proceedings   of  the   2009   joint   COST   2101   and   2102  international   conference   on   Biometric   ID  management   and   multimodal  communication  (BioID_MultiComm'09),   Julian  Fierrez,   Javier   Ortega-­‐Garcia,   Anna   Esposito,  Andrzej  Drygajlo,  and  Marcos  Faundez-­‐Zanuy  (Eds.).   Springer-­‐Verlag,   Berlin,   Heidelberg,  244-­‐251.

[47]   S.   Kopp,   B.   Krenn,   S.   Marsella,   et   al.,  “Towards   a   common   framework   for  multimodal   generation:   the  behavior  markup  language,”   in   Proceedings   of   the   6th  International   Conference   on   Intelligent  Virtual  Agents  (IVA   ’06),  vol.  4133  of  Lecture  Notes   in   Computer   Science,   pp.   205–217,  2006.

[48]  HUMAINE.  http://emotion-­‐research.net/