geeair: a universal multimodal remote control …gpan/publication/2010-puc-geeair.pdfgeeair: a...

13
ORIGINAL ARTICLE GeeAir: a universal multimodal remote control device for home appliances Gang Pan Jiahui Wu Daqing Zhang Zhaohui Wu Yingchun Yang Shijian Li Received: 1 June 2009 / Accepted: 22 October 2009 / Published online: 10 March 2010 Ó Springer-Verlag London Limited 2010 Abstract In this paper, we present a handheld device called GeeAir for remotely controlling home appliances via a mixed modality of speech, gesture, joystick, button, and light. This solution is superior to the existing universal remote controllers in that it can be used by the users with physical and vision impairments in a natural manner. By combining diverse interaction techniques in a single device, the GeeAir enables different user groups to control home appliances effectively, satisfying even the unmet needs of physically and vision-impaired users while maintaining high usability and reliability. The experiments demonstrate that the GeeAir prototype achieves prominent performance through standardizing a small set of verbal and gesture commands and introducing the feedback mechanisms. Keywords Universal remote controller Gesture recognition Speech recognition Smart home 1 Introduction Nowadays, it is almost impossible for home inhabitants to go for a day without interacting with the home appliances. Although remote control of ‘‘home appliances’’ such as TV, DVD, windows, lights, etc. serves well for ordinary people with acceptable physical or emotional comfort, they can provide more for the dignity, security, and well-being of elderly or disabled people [1]. One can imagine a situ- ation where a person has lost some of his/her physical dexterity or mobility. In the absence of suitable controls, he/she would need a caregiver to assist with the operation of home appliances, with the attendant expense and loss of independence and privacy. But with adequate assistance, this person might be able to live independently at his/her home. The current home appliances are often equipped with remote controllers operating via infrared (IR) light signals. Each household is likely to own several remote controllers, which are often incompatible with each other and have different layouts. In order to reduce the number of remote controls, universal remote controllers (URCs) were intro- duced to merge the functions of individual controllers into one device [25]. A URC learns IR command sets from each appliance and operates the appliance selected by a user. There are two fundamental steps involved in the control procedure of a URC: target object selection and command issuing. To select a target object for operation, a user might press a button, turn a rotary wheel, or touch an icon depending on how the panel of the URC is designed. To issue a command, a user needs to point the controller to G. Pan (&) J. Wu Z. Wu Y. Yang S. Li (&) Department of Computer Science, Zhejiang University, Zhejiang, China e-mail: [email protected] J. Wu e-mail: [email protected] Z. Wu e-mail: [email protected] Y. Yang e-mail: [email protected] S. Li e-mail: [email protected] D. Zhang Handicom Lab, Institut TELECOM SudParis, Evry, France e-mail: [email protected] 123 Pers Ubiquit Comput (2010) 14:723–735 DOI 10.1007/s00779-010-0287-7

Upload: duongnhi

Post on 12-Mar-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • ORIGINAL ARTICLE

    GeeAir: a universal multimodal remote control device for homeappliances

    Gang Pan Jiahui Wu Daqing Zhang

    Zhaohui Wu Yingchun Yang Shijian Li

    Received: 1 June 2009 / Accepted: 22 October 2009 / Published online: 10 March 2010

    Springer-Verlag London Limited 2010

    Abstract In this paper, we present a handheld device

    called GeeAir for remotely controlling home appliances via

    a mixed modality of speech, gesture, joystick, button, and

    light. This solution is superior to the existing universal

    remote controllers in that it can be used by the users with

    physical and vision impairments in a natural manner. By

    combining diverse interaction techniques in a single

    device, the GeeAir enables different user groups to control

    home appliances effectively, satisfying even the unmet

    needs of physically and vision-impaired users while

    maintaining high usability and reliability. The experiments

    demonstrate that the GeeAir prototype achieves prominent

    performance through standardizing a small set of verbal

    and gesture commands and introducing the feedback

    mechanisms.

    Keywords Universal remote controller Gesture recognition Speech recognition Smart home

    1 Introduction

    Nowadays, it is almost impossible for home inhabitants to

    go for a day without interacting with the home appliances.

    Although remote control of home appliances such as

    TV, DVD, windows, lights, etc. serves well for ordinary

    people with acceptable physical or emotional comfort, they

    can provide more for the dignity, security, and well-being

    of elderly or disabled people [1]. One can imagine a situ-

    ation where a person has lost some of his/her physical

    dexterity or mobility. In the absence of suitable controls,

    he/she would need a caregiver to assist with the operation

    of home appliances, with the attendant expense and loss of

    independence and privacy. But with adequate assistance,

    this person might be able to live independently at his/her

    home.

    The current home appliances are often equipped with

    remote controllers operating via infrared (IR) light signals.

    Each household is likely to own several remote controllers,

    which are often incompatible with each other and have

    different layouts. In order to reduce the number of remote

    controls, universal remote controllers (URCs) were intro-

    duced to merge the functions of individual controllers into

    one device [25]. A URC learns IR command sets from

    each appliance and operates the appliance selected by a

    user. There are two fundamental steps involved in the

    control procedure of a URC: target object selection and

    command issuing. To select a target object for operation, a

    user might press a button, turn a rotary wheel, or touch an

    icon depending on how the panel of the URC is designed.

    To issue a command, a user needs to point the controller to

    G. Pan (&) J. Wu Z. Wu Y. Yang S. Li (&)Department of Computer Science, Zhejiang University,

    Zhejiang, China

    e-mail: [email protected]

    J. Wu

    e-mail: [email protected]

    Z. Wu

    e-mail: [email protected]

    Y. Yang

    e-mail: [email protected]

    S. Li

    e-mail: [email protected]

    D. Zhang

    Handicom Lab, Institut TELECOM SudParis, Evry, France

    e-mail: [email protected]

    123

    Pers Ubiquit Comput (2010) 14:723735

    DOI 10.1007/s00779-010-0287-7

  • the target appliance and press a specific button on the

    controller. Subsequently, the controller emits the infrared

    signal to the selected appliance for the specified operation.

    Although URCs combine the functions of remote con-

    trollers into one device, elderly and disabled home users

    may still have difficulties in using a URC due to a number

    of reasons: First, a URC has too many buttons that need to

    be remembered, and several button presses may be needed

    to achieve a simple function. Second, the buttons on a URC

    may be too small for the elderly, physically disabled and

    vision-impaired people to use. Finally, button operation is

    just one modality to interact with the home appliances,

    which may not be the most natural and efficient means for

    human machine interaction.

    Speech and gesture are two natural ways that people

    interact with each other. Much research has been done to

    use speech, gesture, or eye-gaze to control home appli-

    ances. However, there is limited success reported in the

    literature on the deployment of these modalities due to the

    constraint of each single modality. Controlling through a

    spoken language or oral command is indeed straightfor-

    ward for expressing intentions, but the single modality of

    speech has the following limitations in real implementa-

    tion: First the accurate extraction and recognition of control

    commands from daily continuous speech is still difficult

    due to the ambiguities of natural languages, especially in

    noisy environments. Second, speech is not instant, e.g.

    some commands need complex phases or sentences, which

    may need a long time to process and react.

    Using the single modality of gesture to control home

    appliances has also been explored. Since the computer

    vision-based gesture and eye-gaze control is highly

    dependent on the lighting condition and camera facing

    angle, it turns out to be rather difficult to accurately rec-

    ognize gestures under poor lighting condition using a

    camera-based system. In addition, it is also uncomfortable

    and inconvenient if the user is required to face the camera

    directly to complete a gesture. Different from the vision-

    based gesture recognition approach, the accelerometer-

    based gesture interaction is an emerging technique that

    exploits the acceleration data of hand motion for recogni-

    tion and control. No camera is required but a wearable or

    portable accelerometer-equipped device in daily life, such

    as a watch, a smart phone or a MP3 player. These wireless-

    enabled portable/wearable devices provide new possibili-

    ties for interacting with a wide range of home appliances

    such as doors, window curtains, TVs, etc.

    In this paper, we present a universal multimodal remote

    control device which unifies several interaction modalities

    such as speech, gesture, button, joystick, and light, so that

    home inhabitants ranging from common users to elderly,

    physically disabled, and vision-impaired people are all able

    to interact with the home appliances in the way they feel

    comfortable. Specifically, we develop a universal multi-

    modal remote controller, called GeeAir, which not only

    provides comfort and convenience for common users in

    controlling home appliances, but also meets the special

    needs of physically and vision-impaired people in operat-

    ing the home appliances to live independently and enjoy a

    better quality of life.

    The paper is organized as follows. First, the related work

    on universal remote controllers and multimodal control

    systems is summarized in Sect. 2. Then an overview of the

    GeeAir system architecture is presented in Sect. 3. In Sect.

    4, the key techniques to select the desired target appliance

    for operation are described, followed by the introduction of

    feedback mechanisms ensuring the reliable confirmation.

    Section 5 proposes a standard set of hand gestures for

    operating different home appliances and a novel algorithm

    for the accelerometer-based gesture recognition. Section 6

    reports the implementation details and the experimental

    results of the speech/gesture recognition algorithms com-

    pared to other existing algorithms. An initial evaluation of

    the GeeAir prototype with 10 users is also given in this

    section. Finally, we provide our conclusions for the design

    and test of GeeAir and highlight some future research

    directions in Sect. 7.

    2 Related work

    In the consumer electronics market, several universal

    remote control products can be found in the home elec-

    tronics stores. These products can be roughly categorized

    into two groups, according to how the target appliance is

    selected: button-based URCs and screen-based URCs. The

    former group allocates a few buttons in the control panel of

    the URC for appliance selection, where one button corre-

    sponds to one appliance. For example, Phillips 4-in-1

    URC has four buttons reserved in the panel to control TV/

    VCR/DVD/SAT, respectively. Users select one of the four

    appliances by pressing the corresponding button [2]. Since

    the number of buttons in a URC control panel is fixed, the

    extensibility of the button-based URCs is limited. The

    screen-based URCs overcome this limitation by putting a

    built-in mini-screen and a navigation button in the control

    panel of URCs. When users press the navigation button, the

    mini-screen shows the selected home appliance one after

    another. When the target appliance appears in the screen,

    the user completes the device selection by releasing the

    button [35]. Apparently, both kinds of URCs only support

    button-pressing as the single input modality, thus people

    with limited motor skills, finger dexterity, or weak vision

    might not be able to use these remote controls.

    In parallel to the efforts of developing universal remote

    controllers by consumer electronics manufacturers, there

    724 Pers Ubiquit Comput (2010) 14:723735

    123

  • has been a lot of research on universal GUI to enable

    mobile devices for home appliance control. Different

    approaches have been proposed to generate the universal

    graphical user interface in various mobile platforms [6, 7].

    All those solutions assume that users can navigate the GUI

    on the tiny screen of a mobile device with a pen or button.

    Thus, they support only one single input modality and

    consequently cannot meet the needs of elders and those

    with certain physical or vision impairment.

    Compared to the single modality solutions, multimodal

    control systems combine the strengths of multiple modal-

    ities, and thus increase the applicability and usability of

    humanmachine interaction. To meet the different

    requirements of varied users and applications, various

    combinations of input and output modalities have been

    explored in previous projects. For example, the seminal

    work by Bolt [8] created a Put-That-There system where

    people can use pointing gesture to select an object from a

    virtual diagram of a room which is shown in a large-screen

    display and subsequently use speech to operate on the

    selected object. The EU HOME-AOM project [9, 10]

    applied the mixed modality of speech, gesture, and GUI for

    the home appliance control for disabled people, in which

    speech and gesture were used to assist in the navigation of

    GUI commands. GWindows [11] operated the Microsoft

    window applications by using speech to move/close/mini-

    mize/maximize/scroll and using motion gestures to deter-

    mine the movement distance. Krum et al. [12] implement a

    system that helps user navigate in a whole earth 3D visu-

    alization environment at a distance from the display. It

    employs Gesture Pendant [13] for tracking of simple hand

    motions and utilizing speech for navigation commands.

    Different from those projects, our work intends to provide a

    single, multimodal control device for a wider range of

    home users, including the elders and those with physical or

    vision impairment besides ordinary users. Our solution

    supports a mixed modality of speech, gesture, button,

    joystick and light as input and output, adapting to different

    needs and interaction preferences of various user groups. In

    addition, we use an accelerometer-based gesture recogni-

    tion approach instead of the camera-based one used pre-

    viously, which allows users to move freely in a ubiquitous

    home environment and control the home appliance in any

    lighting condition.

    The closest research to our work is by Kela et al. [14]

    who used several modalities to interact with a design studio

    environment. The modalities explored include speech input

    and output, gesture input, RFID-tag, a laser-tracked pen

    and a mobile device with touch screen. Our work differs

    from theirs in the following aspects:

    (1) While Kela et als work uses diverse modalities in a

    studio environment, they deploy multiple devices to

    control multiple applications, and we focus on

    building a handy, single multimodal device for

    controlling multiple home appliances.

    (2) Kela et als work takes the design studio as the

    application environment, the designers as the user

    group, and convenience and comfort as the design

    goal. Instead, our research aims at a different, actually

    larger, user group. We not only provide ordinary

    home inhabitants with convenience and comfort, but

    also elders and those with physical and vision

    impairment. For example, we provide joystick as

    one input modality which is very useful for people

    with hand disability.

    (3) In order to ensure the reliability and robustness of the

    multimodal remote controllers for elders and disabled

    people, we introduce voice and light as feedback. So

    that the desired control object can be reliably

    identified even if speech recognition is not 100%

    accurate. In our GeeAir solution, users are allowed to

    use speech or joystick to select a target appliance for

    operation and use voice and light to get feedback.

    Such solution can satisfy the needs of user groups

    with disabilities in speaking, hearing, vision, and

    hand.

    (4) Although we also use accelerometer-based approach

    for gesture control as Kela et al. did, we developed

    a novel and very different algorithm [15] which is

    more accurate than the algorithm used in Ref. [14].

    While they adopted a HMM (hidden Markov

    model)-based approach for gesture recognition and

    process the acceleration data in the time domain

    without conducting feature extraction, we processed

    the data in frequency domain with feature extraction

    to reduce the noise and variation of a gesture data,

    thus significantly improving the recognition

    performance.

    3 GeeAir: an overview

    The design goal of GeeAir is to become a single universal

    remote controller which serves not only common users but

    also those physically disabled and vision-impaired people.

    In the home environment as illustrated in Fig. 1, GeeAir

    takes the inputs from the users to select a target appliance

    first and then recognizes the predefined hand gesture of

    users to control the selected target appliance. As described

    before, the mixed modalities of speech, joystick, light, and

    button are used for selecting a desired target appliance. In

    order to avoid any potential error during the selection, two

    feedback mechanisms are introduced in GeeAir design:

    lighting feedback and voice echo.

    Pers Ubiquit Comput (2010) 14:723735 725

    123

  • The look and feel of GeeAir prototype is shown in

    Fig. 2, which borrows the design from Nintendo Nunchuk.

    The key components of GeeAir and their functionalities are

    described as follows:

    (1) A three-axis built-in accelerometer: to capture users

    3-D hand gesture signals.

    (2) An eight-orientation joystick: to select a target

    appliance efficiently.

    (3) A built-in microphone: to acquire users speech

    commands.

    (4) A speaker: to provide users with voice feedback and

    reminders.

    (5) Button A and B: used to label the beginning and end

    of speech and gesture commands. These two buttons

    are designed in different sizes and shapes, in order to

    help user differentiate them by tactility.

    (6) A built-in digital signal processing unit: to handle the

    computation involved in the processing of multimo-

    dality inputs and outputs.

    (7) A built-in communication unit: to send and receive

    wireless signals.

    The workflow of using GeeAir consists of three main

    stages: appliance selection, feedback and confirmation, and

    operation command issuing, as shown in Fig. 3. At any

    moment, GeeAir has a current appliance for operation. The

    current appliance is indicated by the light signal or voice

    reminder. If a user intends to control another appliance

    rather than the current appliance, he/she needs to select the

    desired one via joystick or speaking the target appliance

    name. If speech is used, GeeAir will obtain the name of the

    target appliance with speech recognition. The feedback for

    appliance selection has two options: light signal (a con-

    trollable light attached to each appliance) and voice echo,

    which help users correct occasional errors of speech rec-

    ognition of the target appliance name. If the current

    appliance is exactly the one that the user wants to operate,

    the user can wave the GeeAir in air for the follow-up

    operations. Then the gesture will be recognized by GeeAir

    Fig. 1 Illustration of theGeeAir for remote control of

    home appliances

    Fig. 2 Conceptual illustrationof the GeeAirs components for

    multimodal control. a a three-axis accelerometer, joystick,

    microphone, speaker, and two

    buttons are built in GeeAir;

    b two buttons (Button A andButton B) in the front view of

    GeeAir

    726 Pers Ubiquit Comput (2010) 14:723735

    123

  • and the corresponding command will be issued to the

    current appliance wirelessly.

    4 Multimodal selection of a target appliance

    4.1 Selecting via speech commands

    Speech is one of the most natural ways for interaction

    between human and machines. However, for home

    appliance control, it is still a great challenge to robustly

    extract and recognize the control commands in real life

    environment using user-independent large vocabulary

    continuous speech recognition technology. In contrast,

    small vocabulary recognition of isolated words is quite

    reliable and accurate, verified by many successful prac-

    tical applications.

    GeeAir provides the option of selecting a target appli-

    ance via speech commands. GeeAir will record users

    utterance through the equipped microphone and then rec-

    ognize the appliance name. In this case, the vocabulary to

    be recognized is small because the number of home

    appliances is limited and their names are relatively fixed. In

    order to avoid the segmentation of the appliance name from

    the natural utterance, users are asked to press Button A on

    GeeAir to start speaking the appliance name for object

    selection, and release the button after speaking the appli-

    ance name.

    For isolated word recognition, the commonly used

    techniques include VQ (Vector Quantization), DTW

    (Dynamic Time Warping), and HMM (Hidden Markov

    Model) [16, 17]. For GeeAir, we build an isolated word

    recognition system based on continuous density hidden

    Markov model (CDHMM) [18]. The whole recognition

    process consists of the following steps:

    (1) Defining the lexicon: recording the words to be

    recognized by the system. Each word is repeatedly

    recorded several times by each participant.

    (2) Feature extraction: the MFCC (Mel Frequency Cep-

    strum Coefficient) feature vectors [19] are computed,

    together with their first derivatives.

    (3) Modeling words: for each word in the lexicon, a left-

    to-right CDHMM is built with a number of states.

    Each state is characterized by a Gaussian mixture

    model (GMM).

    (4) Training the models: the parameters of the distribu-

    tions in GMM and the state transition probabilities

    within CDHMMs are estimated using the Baum-

    Belch algorithm [17].

    (5) Recognition of a word: first, we compute observations

    of the word (feature vector), and then the probability

    of its observations is generated from each of the

    words CDHMM models using the Viterbi algorithm.

    The word is recognized to be the one whose model

    has the highest probability.

    4.2 Selecting via joystick

    The second modality GeeAir provides to select a target

    appliance is through the built-in joystick. Joystick is a

    traditional input device in machine control of trucks, CT

    scanner, as well as video games. It outperforms buttons in

    navigation due to its continuity, fast reaction and nearly no

    relative movement between hand and itself during the

    controlling process. Thus, joystick is a good choice for

    selecting objects which are circled around in the spatial

    space.

    The operation principle of the joystick is illustrated in

    Fig. 4. The accessible area is octagonal. There are two

    states defined for joystick operation: inactive and active.

    Inactive state indicates that joystick is not pushed and stay

    in the middle of the octagon; active state indicates that

    joystick is pushed to the edge of the octagonal at any angle.

    The eight valid joystick positions are: north, northeast, east,

    southeast, south, southwest, west, and northwest. Each

    position occupies 45 degrees.

    Wrong

    Operate the current appliance with

    Yes

    B

    Signallight

    Feedback

    Select a target appliance

    Rotate joystick

    gesture and command issuing

    Continue operatingthe current appliance?

    Begin

    orVoiceecho

    or Speakits name

    Right

    No

    Fig. 3 Workflow of GeeAir

    Pers Ubiquit Comput (2010) 14:723735 727

    123

  • A user can move the joystick along the octagon to select

    appliances in the physical spatial space. Intuitively, an

    octagonal joystick can be matched to eight appliances

    statically. However, to select the target appliance from the

    different number of appliances in each household, GeeAir

    exploits the rule of dynamic and relative association

    between the positions and the appliances. A valid position

    is not necessarily associated with a fixed device. In this

    sense, when a user intends to select an appliance, the initial

    position which he/she pushes the joystick to first is

    dynamically associated with the current selected appliance.

    While the user rotates the joystick to a neighboring posi-

    tion, the current appliance will also shift to its neighboring

    appliance. Whether the left nearest one or the right nearest

    one is selected depends on the users rotating direction, i.e.

    counter-clockwise and clockwise. The dynamic association

    ensures the flexibility when the number of appliances

    varies. Thus, any number of appliances can be easily

    navigated by using the joystick.

    4.3 Feedback mechanism

    GeeAir has two kinds of feedback mechanisms available for

    confirmation purpose: voice echo and signal light. GeeAir

    has a built-in mini-speaker, which can replay the name of the

    appliance when the appliance is selected by either speech or

    joystick. Voice echo informs the user whether the object

    recognized by the system is the desired one that users intend

    to select. If a controllable LED light is attached to each

    appliance, the lights can be used as a feedback, i.e. the red

    LED light of the selected appliance is turned on for user

    confirmation while the other lights are keeping off.

    For joystick-based appliance selection, the light feed-

    back will immediately occur as soon as the joystick

    changes a position, that is, when the joystick moves from

    one position to another, the light signal will also shift from

    one appliance to the next. The instant lighting during

    joystick rotation will be much helpful for user due to the

    quick response of joystick operations. However, the voice

    echo cannot occur for every covered position if joystick

    rotates too fast because there is no enough time for voice

    echo. For this reason, GeeAir sets a movement speed limit,

    one position/second, for voice echo. If the joystick stays in

    a position for less than 1 second, the voice echo of the

    appliance associated dynamically with this position will be

    suppressed. Any voice echo can be interrupted by rotating

    joystick to the next position when users know that the

    current one is not the desired one, which helps users to

    speed up the selection process.

    With the feedback mechanisms, if the user finds the

    recognized object is not the desired one, he/she can correct

    it immediately by repeating the appliance selection. Thus,

    the command issuing for a wrong appliance could be

    avoided. Any of the two feedback mechanisms can be

    combined with one of the two selection schemes introduced

    previously, i.e., there are four combinations available:

    speech-voice, speech-light, joystick-voice, joystick-light.

    Both feedback modalities of voice and light are suitable

    for motor-impaired people, they also free users from

    reading on-screen prompts. The voice-based feedback is

    suitable for any people with normal hearing. Although the

    signal light requires users vision, it is less demanding to

    recognize the binary states, ON, and OFF, of a light, than

    the semantic information in text or picture on screen.

    5 Operating an appliance via gesture

    After the target appliance is selected, GeeAir uses gesture

    commands to operate it. Gestures performed by GeeAir are

    recognized based on acceleration data acquired by the built-

    in three-axis accelerometer [15]. Compared to the camera-

    based gesture recognition techniques [20], the accelerome-

    ter-based gesture recognition does not rely on lighting con-

    ditions and camera facing angle, and also does not require

    any deployment of devices in the environment. Similar to

    issuing speech commands, users begin a gesture by pushing

    the Button B, and end it by releasing the button, avoiding the

    accuracy degradation caused by gesture segmentation.

    5.1 Gesture command definition

    In order to enable effective gesture-based interaction,

    several requirements must be met when designing a set of

    gesture commands for home appliances: (1) the semantic

    connection between gestures and commands should be

    natural, so that the meaning of a gesture is easy to learn and

    remember for users; (2) gestures should be simple and

    terse, avoiding those require high precision over a long

    period of time. Moreover, they should be quick to perform

    and repeat, without causing fatigue over time; (3) the

    gesture commands for different appliances should be con-

    sistent, i.e., similar operations of different appliances

    Fig. 4 Octagonal accessible area of joystick. Each position covers 45degrees. The joystick can be rotated either clockwise or counter-

    clockwise to change the position

    728 Pers Ubiquit Comput (2010) 14:723735

    123

  • should be defined as the same gesture to reduce the size of

    gesture vocabulary which the users have to learn.

    Usually there are two different ways employed in gesture

    command definition: user-dependent and user-independent.

    Previous work focuses more on user-dependent gesture

    recognition [2123], where each user is required to perform a

    couple of gestures as training/template samples before using

    the system. In this case, users are requested to personalize a

    remote controller by mapping each operation to a certain

    gesture they find suitable and comfortable. However, the

    training process is still a burden for users, although some

    work [23, 24] has been done on optimizing recognition

    algorithms to reduce the size of training sample set. GeeAir

    aims at user-independent gesture recognition and control.

    Different users will share a common set of gesture com-

    mands and do not need to train GeeAir from person to person.

    In this paper, we define a nine-gesture vocabulary to

    control the frequently used functions of seven categories of

    home appliances, as listed in Table 1. The gesture of

    ForwardBackward is performed in the XY plane, and the

    other eight gestures are waved in the YZ plane.

    (1) The gesture of ForwardBackward is performed as if

    pushing an ON/OFF switch button on a control panel

    of electronic appliances.

    (2) The swinging gestures of Up and Down are very

    natural to express the meaning of up and down, e.g.

    volume up/down, temperature up/down.

    (3) Similarly, the two gestures of Left and Right are also

    natural to represent the meaning of previous and next.

    (4) The gestures of Double-Left and Double-Right denot-

    ing a fast move toward left or right suggest users of

    fast backward/fast forward.

    (5) The gesture of alphabet V implying a tick or rising

    up suggests a Play operation. Additionally, we

    follow the tradition that most of the current players

    use the same button to share operations of Play and

    Pause.

    (6) The gesture of Inverted V implies a decreasing trend,

    which we define as a Stop operation.

    Specifically, however, Up/Down and Double-Left/Dou-

    ble-Right are continuous commands rather than instant

    ones, for example, modulating the volume or adjusting

    curtains is a continuous operation. In order to avoid fre-

    quently performing the same gesture, when such com-

    mands are recognized, GeeAir will continuously issue the

    commands with a certain interval until users push Button B

    or it reaches to its maximum.

    5.2 Gesture recognition with FDSVM

    GeeAir employs the algorithm FDSVM [15], proposed by

    the authors, to recognize gesture commands from acceler-

    ation data. FDSVM uses a frame-based descriptor to

    compactly represent a gesture, which reduces noise and

    variation of a gesture data, and thus improves the gesture

    recognition performance significantly.

    The FDSVM system has two main phasestraining and

    recognizingand four componentsacceleration data

    Table 1 Definition of gesture commands for appliances

    Appliance Gesture commands

    Forwardbackward Up; down Left; right Double-left; double-right V; inverted-V

    Television ON/OFF Vol. up

    Vol. down

    Prev. channel

    Next channel

    DVD ON/OFF Vol. up

    Vol. down

    Prev. track

    Next track

    F Forward

    F Backward

    Play/pause

    Stop

    Radio ON/OFF Vol. up

    Vol. down

    Prev. channel

    Next channel

    Speaker ON/OFF Vol. up

    Vol. down

    Air conditioner ON/OFF Temp. up

    Temp. down

    Lamp ON/OFF Brtn. up

    Brtn. down

    Curtain Open/Close Curt. up

    Curt. down

    Vol, Volume; F Forward, Fast Forward; F Backward, Fast Backward; Temp, Temperature; Brtn, Brightness; Curt, Curtain

    Pers Ubiquit Comput (2010) 14:723735 729

    123

  • acquisition, feature extraction, training SVM, and recog-

    nition by SVM, as shown in Fig. 5. The former two com-

    ponents are shared by the training and recognizing phases.

    5.2.1 Feature extraction: frame-based gesture descriptor

    The three-axis accelerometer built in GeeAir can discretely

    sense the gestural acceleration data of three spatial

    orthogonal axes. We denote a gesture command as:

    G ax; ay; az

    where ax, ay, az are the acceleration sequences from three

    axes. We divide a gesture into N ? 1 segments with iden-

    tical length, and every two adjunct segments make up a

    frame with a segment-length overlap, as illustrated in Fig. 6.

    We employ five features in both frequency and spatial

    domain to characterize each frame:

    In frequency domain (discrete Fourier transform (DFT)

    on each frame per axis):

    (1) mean l: the DC component over the frame(2) energy e: the sum of the squared DFT component

    magnitudes except the DC component, and subse-

    quently divided by the number of the components for

    the purpose of normalization.

    (3) entropy d: the normalized information entropy of theDFT component magnitudes with the DC component

    excluded.

    In spatial domain:

    (4) standard deviation r: indicates the amplitude vari-ability of a gesture

    (5) correlation c among the axes: implies the strength ofa linear relationship between each pair of axis.

    We combine all features extracted as described above to

    form a feature vector s, which represents the gesture

    command itself. Considering 5 features per frame per axis,

    3 axes, and N frames per gesture, the dimension of the

    feature vector should be d = 53N = 15N.

    5.2.2 Gesture classification: multiclass SVM

    Suppose there are two types of gestures GTR1, GTR2

    needed to be classified. We denote the training set with n

    samples as

    fsi; gig; i 1; . . .; n

    where si 2 Rd represents a feature vector of a gesturecommand and

    gi 1; if si belongs to GTR11; if si belongs to GTR2

    A separating plane written as

    w s b 0

    which can be obtained by solving a dual convex quadratic

    programming problem [25].

    The extension to multiple gestures classification is

    achieved by a multiclass SVM using one-versus-one

    strategy or one-versus-all strategy. SVM is a method to

    deal with the highly non-linear classification and regression

    problems. Benefiting from structural risk minimization

    principle and avoidance of over-fitting by its soft margin,

    SVM usually outperforms the traditional parameter esti-

    mation methods which are based on Law of Large Num-

    bers when there are merely limited training data available.

    6 Evaluations

    6.1 Implementation

    We build a prototype of GeeAir, including hardware and

    algorithms implementation, to verify the design and

    AccelerationData

    Acquisition

    Feature Extraction

    TrainingSVM

    Recognitionby SVM

    FrameSegmentation

    FeatureCalculation

    Fig. 5 Block diagram of theFDSVM gesture recognition

    system

    Segment 0 Segment 1 Segment 2 Segment N

    Frame 1Frame 0

    Frame N-1

    Gesture

    . . . . . .

    .

    .

    .

    Fig. 6 Illustration of segmentsand frames for a gesture

    730 Pers Ubiquit Comput (2010) 14:723735

    123

  • performance. Currently, the GeeAir can acquire speech and

    gesture commands with two buttons, and perform joystick-

    based selection. The software, including algorithms for

    speech recognition and gesture recognition, is still imple-

    mented on a PC instead of GeeAir. We use Bluetooth to

    connect the GeeAir and the PC.

    6.1.1 Hardware setup

    The GeeAir prototype is built based on Nintendo Wiimote

    for acceleration sensing and its expansion Nunchuk for

    joystick selection. It has a 3-D accelerometer, a joystick,

    and two buttons: Button A and Button B (inspired by

    Button C and Button Z of Nunchuk). The built-in micro-

    phone and speaker of GeeAir are simply replaced with

    Bluetooth wireless headphone connected to a laptop com-

    puter. Wiimote is also employed to help build communi-

    cation between the laptop computer and GeeAir.

    GeeAir utilizes Bluetooth as the non-directional wireless

    communication. However, most of current appliances

    adopt infrared remote controllers and therefore are unable

    to receive Bluetooth signal. We developed a Bluetooth

    infrared Adaptor (BI Adaptor) to convert the Bluetooth

    signals to infrared signals, which will be unnecessary when

    the appliances are able to communicate via Bluetooth. Also

    the signal light for feedback mechanism is embedded on

    the BI Adaptor, shown in Fig. 7.

    6.1.2 Algorithms implementation

    For the isolated word recognition in GeeAir, the lexicon has

    12 words for seven categories of home appliances, shown in

    Table 2. The utterances are recorded with 16 kHz sampling

    frequency and 16-bit resolution. The feature vector of 26

    dimensional MFCC (13 dimensional cepstrum coefficients

    and their first derivatives) is employed, which is computed

    with a window size of 32 ms and a step size of 16 ms. Each

    word is represented by a trained left-to-right CDHMM

    model with 3 states, which is implemented on the base of

    HTK (Hidden Markov Toolkit) [26]. The eight-dimension

    mixture Gaussian distribution is used for modeling states.

    We use 6 Baum-Welch re-estimation iterations.

    Gesture recognition with FDSVM for GeeAir uses an

    open source software package of FFTW [27] for discrete

    Fourier transformation. Then five features mean, energy,

    entropy, correlation, and standard deviation of individual

    axis in one frame are calculated. The feature vector is

    eventually put into a classifier in order to train an SVM

    model or retrieve a recognized gesture type. The SVM

    component utilizes the package SVMmulticlass [28]. The

    details may refer to the reference [15].

    6.2 Data acquisition

    To evaluate the GeeAirs performance of oral command

    recognition and gesture recognition, we built a speech

    Fig. 7 Components of theBluetoothinfrared adaptor

    Table 2 Speech vocabulary of twelve Chinese words for sevenappliances

    No. Appliances Chinese words

    1 Television Dian sh

    Dian sh j

    2 DVD player DVD

    3 Radio Shou yn

    Shou yn j

    4 Speaker Yn xiang

    Yn xiang

    5 Air conditioner Kong tiao

    6 Lamp Dian deng

    Tai deng

    R guang deng

    7 Curtain Chuang lian

    Pers Ubiquit Comput (2010) 14:723735 731

    123

  • database with 7 appliance names and a gesture acceleration

    database with 9 gestures. Both databases are acquired by 10

    persons, including 5 males and 5 females. The collection

    procedure lasts 5 days.

    The vocabulary in speech database includes 12 Chinese

    words of 7 appliances, listed in Table 2. Some of the

    appliances may have more than one name, depending on

    users habits. Each user is required to record 4 times per

    word per day. Thus, each user has 20 samples for each

    Chinese word.

    For the gesture acceleration database, each participant

    was asked to perform each gesture for 6 repetitions per

    day. Thus, there are 6 9 5 9 9 9 10 = 2,700 samples.

    The start and end of a gesture are labeled by pressing the

    Button B on the Wiimote during data acquisition. Fig-

    ure 8 illustrates the acquisition devices. We divided the 9

    gestures into 3 groups as listed in Table 3, for the pur-

    pose of evaluating usability for different potential appli-

    ances. For example, Group 1 is for speaker, air

    conditioner, lamp, and curtain; Group 2 is for television

    and radio.

    We employed the leave-one-day-out cross validation for

    the user-dependent case and the leave-one-person-out cross

    validation for the user-independent case in speech and

    gesture experiments. For the leave-one-day-out cross-val-

    idation, we divide all the samples into five partitions,

    choosing 1 days samples for a partition (namely 60 sam-

    ples per gesture per partition, and 40 samples per word per

    partition). At each time, four partitions from five are for

    training, the remainder of one partition is for testing. We

    then repeat it five times and finally take the average rec-

    ognition rate. For the leave-one-person-out cross-valida-

    tion, nine participants data (out of ten) is used as the

    training set; the data of the remaining participant is used as

    the testing set.

    6.3 Speech recognition accuracy

    Using the 12-word speech data described previously, the

    experimental results shows that the user-dependent speech

    recognition achieves the accuracy of 98.21%, and user-

    independent performance has the recognition rate of

    91.79%. Figure 9 illustrates the recognition performance

    over time in the user-dependent case.

    6.4 Gesture recognition accuracy

    6.4.1 Experiment 1: effect of frame number N

    The purpose of analyzing a gesture in frames rather than as

    a whole is to describe its local characteristics correspond-

    ing to time span. The frame count N indicates the precision

    we know about a gesture. Intuitively, the more frames a

    gesture is broken up into, the more details are known about

    the gesture. However, it may lead to the over-fitting

    problem if the frame number N is too large. It will also

    increase the dimension of the feature space, which

    increases computational complexity. This experiment is to

    examine the effect of varying N.

    Figure 10 shows the experimental results for varying

    frame number N using the data set of Group 3. As can be

    seen, higher-rating occurs at the center in both curves and

    lower-rating at both ends. This result supports our

    assumption that the feature will convey little discriminativeFig. 8 Acquisition devices of gesture acceleration data

    Table 3 The nine gestures are divided into three groups for thegesture recognition experiments

    No. Size Gesture

    1 3 Forwardbackward, up, down

    2 5 Forwardbackward, up, down, left, right

    3 9 Forwardbackward, up, down, left, right, double-left,

    double-right, V, inverted-V

    50%

    55%

    60%

    65%

    70%

    75%

    80%

    85%

    90%

    95%

    100%

    Day 1 Day 2 Day 3 Day 4 Day 5 Average

    Recognition rate

    Fig. 9 User-dependent speech recognition result varying over time

    732 Pers Ubiquit Comput (2010) 14:723735

    123

  • information when N is too small, and the over-fitting

    problem will occur when N is too large. The recognition

    accuracy is obviously lower than the rest when N is 2. The

    two curves are nearly flat when N is between 4 and 7. In the

    following experiments, we choose N = 5.

    6.4.2 Experiment 2: user-dependent gesture recognition

    In this experiment, to demonstrate the performance of our

    method, we compare it with four methods: decision tree

    C4.5, Nave Bayes, DTW, and the HMM algorithm. We

    employed the implementation of C4.5 by Quinlan [29] for

    comparison purpose.

    We carried out the experiments and comparison tests on

    the 3 groups of data set, respectively. The comparison

    results are shown in Fig. 11. When recognizing the three

    gestures of Group 1, all the five approaches obtain the

    recognition rate of more than 90%, where our proposed

    FDSVM achieves 99.17% (a little bit lower than DTW,

    99.76%). When the number of gesture type increases, the

    performance of HMM and DTW decreases significantly. In

    contrast, our FDSVM method performs well even in rec-

    ognizing all the 9 gestures, with the recognition rate of

    96.40%.

    6.4.3 Experiment 3: user-independent gesture recognition

    User-independent case means that the system is well-

    trained before users use it. Such implementation avoids

    users efforts to perform several gestures as training data.

    The results of user-independent gesture recognition test

    and comparison are shown in Fig. 12. Obviously, the rec-

    ognition rate of user-independent gesture recognition is

    lower than that of user-dependent one. Our FDSVM has

    very stable recognition performance when the number of

    gesture types increases. It achieves the recognition rate of

    94.17% for 3 gestures of Group 1 and 91.07% for 9 ges-

    tures of Group 3. DTW achieves recognition rate of

    97.38% for Group 1 and 95.78% for Group 2, slightly

    outperforming our methods. However, our FDSVM sig-

    nificantly outperforms DTW in 9 gestures of Group 3. The

    result reveals that our FDSVM has good generalization

    capability with respect to the number of gesture types.

    6.5 Response time test

    We have set up 8 home appliances as control objects in the

    laboratory: a curtain, two lights, a TV, an air-conditioner, a

    speaker, and a DVD player. We then recruited 10 graduate

    students in the laboratory for the experiments, none of

    whom used the GeeAir before. A series of tasks were

    defined as follows in order to test each user one after

    another:

    1. Use speech to select a target appliance (one of eight).

    After a red light feedback from the system for

    confirmation, conduct gestures to control the

    appliance.

    2. Use the joystick to repeat the same task as Step 1.

    3. Cover the eyes of each participant to simulate the

    situation for a blind person, using speech to select a

    target appliance (one of eight). After a voice feedback

    from the system, conduct gestures to control the

    appliance.

    50%

    55%

    60%

    65%

    70%

    75%

    80%

    85%

    90%

    95%

    100%

    2 3 4 5 6 7 9 11 13 15 17 19

    Rec

    og

    nit

    ion

    Rat

    e

    Frame Number

    user dependentuser independent

    Fig. 10 Experimental result for various frame number N

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    1 2 3

    Rec

    og

    nit

    ion

    Rat

    e

    FDSVM Nave Bayes C4.5 DTW HMM

    Group No.

    Fig. 11 Experimental results for the user-dependent case

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    1 2 3

    Rec

    og

    nit

    ion

    Rat

    e

    FDSVM Nave Bayes C4.5 DTW HMM

    Group No.

    Fig. 12 Experimental result for the user-independent case

    Pers Ubiquit Comput (2010) 14:723735 733

    123

  • 4. Use joystick to repeat the same task as Step 3.

    Table 4 shows the average response time of different

    stages when students use the GeeAir prototype. We can see

    that it is faster to select a target using joystick than speech

    because selection by speech needs lots of time (i.e. 1.4 s) to

    speak an appliance name. The computational cost for rec-

    ognition of both speech and gesture is less than 0.5 s. For a

    user, response time of feedback by light is nearly negligible

    (only 43 ms). For the procedure of gesture command,

    including gesture action and gesture recognition, the

    average time spent is 0.483 s.

    7 Conclusions

    We have developed a handheld, universal multimodal

    remote control device, called GeeAir, for controlling home

    appliances/appliances via a mixed modality of speech,

    gesture, joystick, button, and light. Compared to the

    existing universal remote controllers, GeeAir can enable

    even those with physical, hearing, and vision impairment to

    control home appliances in a natural manner. Compared to

    the existing multimodal solutions interacting with the smart

    environments, GeeAir provides a handy and single device

    solution, not only providing comfort and convenience for

    common users in controlling home appliances but also

    meeting the special needs of physically and vision-

    impaired people in operating the home appliances.

    Single modality such as speech, gesture, joystick, but-

    ton, and light all has its own strength and weakness. By

    combining those diverse but complementary modalities

    together and integrating them into a single device, different

    home user groups can always find a combination of

    modalities they feel comfortable to interact with the envi-

    ronment. GeeAir represents an interesting attempt toward

    bringing the multimodal interaction techniques closer to

    the everyday life of home users, particularly those who

    need assistance for independent living.

    Speech and gesture are two most natural ways that people

    interact with each other. Even though the continuous speech

    and gesture recognition techniques are still not mature

    enough to be deployed in real applications, we achieved very

    good performance in our work through standardizing a small

    set of easily learned verbal commands and gestures, and

    introducing feedback mechanisms.

    Multimodal interaction devices are necessary for mobile

    and ubiquitous environments. The GeeAir prototype per-

    mits us to begin developing the design space for mapping

    interactions with multimodal commands. Such a space will

    be necessary for optimally supporting different home users

    in different context.

    The initial test results show clear benefits of the multi-

    modal device GeeAir over the universal remote controllers

    and other single modality based solutions. In the future, we

    plan to conduct a series of formal evaluations of GeeAir

    with real home users, including elderly and disabled

    inhabitants. Hopefully, the study will shed light on the

    cognitive load of various combinations of modalities:

    speech-gesture, joystick-gesture, speech-button, and joy-

    stick-button, in order to further improve the future design

    of GeeAir.

    Acknowledgments The authors would like to thank the commentsand suggestions from the anonymous reviewers. The laboratory stu-

    dents participation in the experiments is greatly appreciated. This

    work is supported in part by the National High-Tech Research and

    Development (863) Program of China (No. 2008AA01Z132,

    2009AA011900), the Natural Science Fund of China (No. 60525202,

    60533040), and the France ICT-Asia I-CROSS program. Dr. Shijian

    Li is corresponding author.

    References

    1. Campbell LW (1997) A more universal remote control. http://web.

    media.mit.edu/*lieber/Teaching/Collab97/Collab-Projects/remote.html

    2. http://www.consumer.philips.com/consumer/en/gb/consumer/

    cc/_categoryid_3000_SERIES_REMOTE_CONTROL_SU_GB_

    CONSUMER/[4-in-1TV/VCR/DVD/SAT]

    3. http://www.oneforall.co.uk/en_UK/product/1/universal-remotes/

    3/advanced/25/digital-12

    4. http://www.logitech.com/index.cfm/remotes/universal_remotes/

    devices/3898&cl=us,en

    5. http://www.universalremote.com/product_detail.php?model=158

    6. Lee L, Johnson T (2006) URCousin: universal remote control

    user interface. In: Proceedings of the Human Interface Technol-

    ogies Conference, April 2006

    7. Niezen G, Hancke GP (2008) Gesture recognition as ubiquitous

    input for mobile phones. International Workshop on Devices that

    Alter Perception (DAP08), conjunction with Ubicomp08, 2008

    8. Bolt RA (1980) Put-that-there: voice and gesture at the graphics

    interface, SIGGRAPH80, pp 262270

    9. Machate J, Burmester M, Bekiaris E (1997) Towards an intelli-

    gent multimodal and multimedia user interface providing a

    new dimension of natural HMI in the teleoperation of all

    home appliances by E&D users, 6th International Conference

    Table 4 Average response time of different stages (unit: millisecond)

    Target selection Feedback Gesture (action ? recognition)

    Joystick 1266 Light 43 426 ? 57

    Speech (speaking ? recognition) 1397 ? 406 Voice 736

    734 Pers Ubiquit Comput (2010) 14:723735

    123

    http://web.media.mit.edu/~lieber/Teaching/Collab97/Collab-Projects/remote.htmlhttp://web.media.mit.edu/~lieber/Teaching/Collab97/Collab-Projects/remote.htmlhttp://web.media.mit.edu/~lieber/Teaching/Collab97/Collab-Projects/remote.htmlhttp://www.consumer.philips.com/consumer/en/gb/consumer/cc/_categoryid_3000_SERIES_REMOTE_CONTROL_SU_GB_CONSUMER/[4-in-1TV/VCR/DVD/SAT]http://www.consumer.philips.com/consumer/en/gb/consumer/cc/_categoryid_3000_SERIES_REMOTE_CONTROL_SU_GB_CONSUMER/[4-in-1TV/VCR/DVD/SAT]http://www.consumer.philips.com/consumer/en/gb/consumer/cc/_categoryid_3000_SERIES_REMOTE_CONTROL_SU_GB_CONSUMER/[4-in-1TV/VCR/DVD/SAT]http://www.oneforall.co.uk/en_UK/product/1/universal-remotes/3/advanced/25/digital-12http://www.oneforall.co.uk/en_UK/product/1/universal-remotes/3/advanced/25/digital-12http://www.logitech.com/index.cfm/remotes/universal_remotes/devices/3898&cl=us,enhttp://www.logitech.com/index.cfm/remotes/universal_remotes/devices/3898&cl=us,enhttp://www.universalremote.com/product_detail.php?model=158

  • ManMachine Interactions Intelligent Systems in Business,

    Montpellier, May 1997, pp 226229

    10. Machate J (1999) Being naturalon the use of multimodal

    interaction concepts in smart homes. In: Proceedings of the HCI

    International 99, pp 937941

    11. Wilson A, Oliver N (2003) Gwindows: robust stereo vision for

    gesture-. based control of windows. In: Proceedings of the 5th

    international conference on multimodal interfaces, New York,

    NY, USA, pp 211218

    12. Krum DM, Omoteso O, Ribarsky W, Starner T, Hodges LF

    (2002) Speech and Gesture Multimodal Control of a Whole Earth

    3D Visualization Environment. In: Proceedings of Symposium on

    Data Visualization, Barcelona, Spain, pp 195200

    13. Starner T, Auxier J, Ashbrook D, Gandy M (2000) The gesture

    pendant: a self-illuminating, wearable, infrared computer vision

    system for home automation control and medical monitoring.

    International Symposium on Wearable Computers (ISWC00),

    pp 8795

    14. Kela J, Korpipaa P, Mantyjarvi J, Kallio S, Savino G, Jozzo L,

    Marca D (2006) Accelerometer-based gesture control for a design

    environment, Personal Ubiquitous Computing, 10:285299

    15. Wu J, Pan G, Li S, Zhang D (2009) Gesture Recognition with a

    3D Accelerometer. The Sixth International Conference on

    Ubiquitous Intelligence and Computing (UIC-09), Brisbane,

    Australia, 79 July, 2009

    16. Rabiner L, Levinson L (1981) Isolated and connected word rec-

    ognitiontheory and selected applications. IEEE Trans Commun

    29(5):621659

    17. Rabiner LR (1989) A tutorial on hidden markov models and

    selected applications in speech recognition. Proc IEEE 77:257

    286

    18. Lee C-H, Lin C-H, Juang B-H (1991) A study on speaker

    adaptation of the parameters of continuous density hidden Mar-

    kov models. IEEE Trans Signal Process 39(4):806814

    19. Davis SB, Mermelstein P (1980) Comparison of parametric

    representation for monosyllabic word recognition in continuously

    spoken sentences. IEEE Trans Acoust Speech Signal Process

    28:357366

    20. Mitra S, Acharya T (2007) Acharya: gesture recognition: a sur-

    vey. IEEE Trans Syst Man Cybern Part C 37(3):311324

    21. Schlomer T, Poppinga B, Henze N, Boll S (2008) Gesture Rec-

    ognition with a Wii Controller. International Conference on

    Tangible and Embedded Interaction (TEI08), pp 1114, Bonn

    Germany, Feb. 1820, 2008

    22. Mantyla V-M (2001) Discrete hidden markov models with

    application to isolated user-dependent hand gesture recognition.

    VTT publications

    23. Liu J, Wang Z, Zhong L, Wickramasuriya J, Vasudevan V (2009)

    uWave: accelerometer-based personalized gesture recognition

    and its applications. IEEE PerCom09, 2009

    24. Mantyjarvi J, Kela J, Korpipaa P, Kallio S (2004) Enabling fast

    and effortless customization in accelerometer based gesture

    interaction. Proceedings of the 3rd International Conference on

    Mobile and Ubiquitous Multimedia (MUM04), ACM Press, 25

    31, October 2729

    25. Christanini J, Taylor JS (2000) An introduction to support vector

    machines and other kernel-based methods. Cambridge University

    Press, Cambridge

    26. HTK: http://htk.eng.cam.ac.uk/

    27. Frigo M, Johnson SG (2005) The design and implementation of

    FFTW3. Proc IEEE 93(2)

    28. Joachims T (1999) Making large-scale SVM learning practical.

    Advances in kernel methodssupport vector learning. In:

    Schollkopf B, Burges C, Smola A (ed) MIT-Press

    29. Quinlan JR (1996) Improved use of continuous attributes in c4.5.

    J Artif Intell Res 4:7790

    Pers Ubiquit Comput (2010) 14:723735 735

    123

    http://htk.eng.cam.ac.uk/

    GeeAir: a universal multimodal remote control device for home appliancesAbstractIntroductionRelated workGeeAir: an overviewMultimodal selection of a target applianceSelecting via speech commandsSelecting via joystickFeedback mechanism

    Operating an appliance via gestureGesture command definitionGesture recognition with FDSVMFeature extraction: frame-based gesture descriptorGesture classification: multiclass SVM

    EvaluationsImplementationHardware setupAlgorithms implementation

    Data acquisitionSpeech recognition accuracyGesture recognition accuracyExperiment 1: effect of frame number NExperiment 2: user-dependent gesture recognitionExperiment 3: user-independent gesture recognition

    Response time test

    ConclusionsAcknowledgmentsReferences

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 149 /GrayImageMinResolutionPolicy /Warning /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 150 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 599 /MonoImageMinResolutionPolicy /Warning /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

    /CreateJDFFile false /Description > /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [ > /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles false /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /DocumentCMYK /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /UseDocumentProfile /UseDocumentBleed false >> ]>> setdistillerparams> setpagedevice