hands and speech in space

28
Hands and Speech in Space Mark Billinghurst [email protected] The HIT Lab NZ, University of Canterbury May 28 th 2014

Upload: mark-billinghurst

Post on 06-May-2015

514 views

Category:

Technology


0 download

DESCRIPTION

Speech given by Mark Billinghurst at the AWE 2014 conference on how to use multimodal speech and gesture interaction with Augmented Reality applications. Talk given on May 28th, 2014.

TRANSCRIPT

Page 1: Hands and Speech in Space

Hands and Speech in Space

Mark Billinghurst

[email protected]

The HIT Lab NZ, University of Canterbury

May 28th 2014

Page 2: Hands and Speech in Space

2012 – Iron Man 2

Page 3: Hands and Speech in Space

To Make the Vision Real..  Hardware/software requirements

 Contact lens displays  Free space hand/body tracking  Speech/gesture recognition  Etc..

 Most importantly  Usability/User Experience

Page 4: Hands and Speech in Space

Natural Hand Interaction

  Using bare hands to interact with AR content  MS Kinect depth sensing   Real time hand tracking   Physics based simulation model

Page 5: Hands and Speech in Space

Pros and Cons of Gesture Only Input  Gesture-only good for

 Direct manipulation,  Selection, Motion  Rapid expressiveness

  Limitations  Descriptions (eg Temporal information)  Operation on large numbers of objects   Indirect manipulation, delayed actions

Page 6: Hands and Speech in Space

Multimodal Interaction   Combined speech and gesture input   Gesture and Speech complimentary

  Speech: modal commands, quantities  Gesture: selection, motion, qualities

  Previous work found multimodal interfaces intuitive for 2D/3D graphics interaction  However, few multimodal AR interfaces

Page 7: Hands and Speech in Space

Wizard of Oz Study   What speech and gesture input

would people like to use?   Wizard

  Perform speech recognition  Command interpretation

  Domain   3D object interaction/modelling

Lee, M., & Billinghurst, M. (2008, October). A Wizard of Oz study for an AR multimodal interface. In Proceedings of the 10th international conference on Multimodal interfaces (pp. 249-256). ACM.

Page 8: Hands and Speech in Space

System Architecture

Page 9: Hands and Speech in Space

System Set Up

Page 10: Hands and Speech in Space

Key Results   Most commands multimodal

 Multimodal (63%), Gesture (34%), Speech (4%)

  Most spoken phrases short   74% phrases average 1.25 words long   Sentences (26%) average 3 words

  Main gestures deictic (65%), metaphoric (35%)   In multimodal commands gesture issued first

  94% time gesture begun before speech

Page 11: Hands and Speech in Space

Free Hand Multimodal Input

  Use free hand to interact with AR content   Recognize simple gestures  Open hand, closed hand, pointing

Point Move Pick/Drop

Lee, M., Billinghurst, M., Baek, W., Green, R., & Woo, W. (2013). A usability study of multimodal input in an augmented reality environment. Virtual Reality, 17(4), 293-305.

Page 12: Hands and Speech in Space

Speech Input   MS Speech + MS SAPI (> 90% accuracy)   Single word speech commands

Page 13: Hands and Speech in Space

Multimodal Architecture

Page 14: Hands and Speech in Space

Multimodal Fusion

Page 15: Hands and Speech in Space

Hand Occlusion

Page 16: Hands and Speech in Space

Experimental Setup

Change object shape and colour

Page 17: Hands and Speech in Space

User Evaluation

  Change object shape, colour and position   Conditions

  (1) Speech only, (2) gesture only, (3) multimodal   Measures

  performance time, errors, subjective survey

Page 18: Hands and Speech in Space

Results - Performance

  Average performance time   Gesture: 15.44s   Speech: 12.38s   Multimodal: 11.78s

  Significant difference across conditions (p < 0.01)  Difference between gesture and speech/MMI

Page 19: Hands and Speech in Space

Subjective Results (Likert 1-7)

  User subjective survey   Gesture significantly worse, MMI and Speech same   MMI perceived as most efficient

  Preference   70% MMI, 25% speech only, 5% gesture only

Gesture Speech MMI

Naturalness 4.60 5.60 5.80

Ease of Use 4.00 5.90 6.00

Efficiency 4.45 5.15 6.05

Physical Effort 4.75 3.15 3.85

Page 20: Hands and Speech in Space

Observations   Significant difference in number of commands

 Gesture (6.14), Speech (5.23), MMI (4.93)

  MMI Simultaneous vs. Sequential commands   79% sequential, 21% simultaneous

  Reaction to system errors   Almost always repeated same command   In MMI rarely changes modalities

Page 21: Hands and Speech in Space

Lessons Learned   Multimodal interaction significantly better than

gesture alone in AR interfaces for 3D tasks   Shorter task time, more efficient

  Multimodal input was more natural, easier, and more effective that gesture/speech only   Simultaneous input rarely used

  More studies need to be conducted  What gesture/speech patterns? Richer input

Page 22: Hands and Speech in Space

3D Gesture Tracking

  3 Gear Systems   Kinect/Primesense Sensor   Two hand tracking   http://www.threegear.com

Page 23: Hands and Speech in Space

Skeleton Interaction + AR

  HMD AR View   Viewpoint tracking

  Two hand input   Skeleton interaction, occlusion

Page 24: Hands and Speech in Space

AR Rift Display

Page 25: Hands and Speech in Space
Page 26: Hands and Speech in Space
Page 27: Hands and Speech in Space

Conclusions   AR experiences need new interaction methods   Combined speech and gesture more powerful

 Complimentary input modalities

  Natural user interfaces possible   Free hand gesture, speech, intelligence interfaces

  Important research directions for the future  What gesture/speech commands should be used?   Relationship better speech and gesture?

Page 28: Hands and Speech in Space

More Information

•  Mark Billinghurst –  Email: [email protected]

– Twitter: @marknb00

•  Website –  http://www.hitlabnz.org/