hands and speech in space
DESCRIPTION
Speech given by Mark Billinghurst at the AWE 2014 conference on how to use multimodal speech and gesture interaction with Augmented Reality applications. Talk given on May 28th, 2014.TRANSCRIPT
Hands and Speech in Space
Mark Billinghurst
The HIT Lab NZ, University of Canterbury
May 28th 2014
2012 – Iron Man 2
To Make the Vision Real.. Hardware/software requirements
Contact lens displays Free space hand/body tracking Speech/gesture recognition Etc..
Most importantly Usability/User Experience
Natural Hand Interaction
Using bare hands to interact with AR content MS Kinect depth sensing Real time hand tracking Physics based simulation model
Pros and Cons of Gesture Only Input Gesture-only good for
Direct manipulation, Selection, Motion Rapid expressiveness
Limitations Descriptions (eg Temporal information) Operation on large numbers of objects Indirect manipulation, delayed actions
Multimodal Interaction Combined speech and gesture input Gesture and Speech complimentary
Speech: modal commands, quantities Gesture: selection, motion, qualities
Previous work found multimodal interfaces intuitive for 2D/3D graphics interaction However, few multimodal AR interfaces
Wizard of Oz Study What speech and gesture input
would people like to use? Wizard
Perform speech recognition Command interpretation
Domain 3D object interaction/modelling
Lee, M., & Billinghurst, M. (2008, October). A Wizard of Oz study for an AR multimodal interface. In Proceedings of the 10th international conference on Multimodal interfaces (pp. 249-256). ACM.
System Architecture
System Set Up
Key Results Most commands multimodal
Multimodal (63%), Gesture (34%), Speech (4%)
Most spoken phrases short 74% phrases average 1.25 words long Sentences (26%) average 3 words
Main gestures deictic (65%), metaphoric (35%) In multimodal commands gesture issued first
94% time gesture begun before speech
Free Hand Multimodal Input
Use free hand to interact with AR content Recognize simple gestures Open hand, closed hand, pointing
Point Move Pick/Drop
Lee, M., Billinghurst, M., Baek, W., Green, R., & Woo, W. (2013). A usability study of multimodal input in an augmented reality environment. Virtual Reality, 17(4), 293-305.
Speech Input MS Speech + MS SAPI (> 90% accuracy) Single word speech commands
Multimodal Architecture
Multimodal Fusion
Hand Occlusion
Experimental Setup
Change object shape and colour
User Evaluation
Change object shape, colour and position Conditions
(1) Speech only, (2) gesture only, (3) multimodal Measures
performance time, errors, subjective survey
Results - Performance
Average performance time Gesture: 15.44s Speech: 12.38s Multimodal: 11.78s
Significant difference across conditions (p < 0.01) Difference between gesture and speech/MMI
Subjective Results (Likert 1-7)
User subjective survey Gesture significantly worse, MMI and Speech same MMI perceived as most efficient
Preference 70% MMI, 25% speech only, 5% gesture only
Gesture Speech MMI
Naturalness 4.60 5.60 5.80
Ease of Use 4.00 5.90 6.00
Efficiency 4.45 5.15 6.05
Physical Effort 4.75 3.15 3.85
Observations Significant difference in number of commands
Gesture (6.14), Speech (5.23), MMI (4.93)
MMI Simultaneous vs. Sequential commands 79% sequential, 21% simultaneous
Reaction to system errors Almost always repeated same command In MMI rarely changes modalities
Lessons Learned Multimodal interaction significantly better than
gesture alone in AR interfaces for 3D tasks Shorter task time, more efficient
Multimodal input was more natural, easier, and more effective that gesture/speech only Simultaneous input rarely used
More studies need to be conducted What gesture/speech patterns? Richer input
3D Gesture Tracking
3 Gear Systems Kinect/Primesense Sensor Two hand tracking http://www.threegear.com
Skeleton Interaction + AR
HMD AR View Viewpoint tracking
Two hand input Skeleton interaction, occlusion
AR Rift Display
Conclusions AR experiences need new interaction methods Combined speech and gesture more powerful
Complimentary input modalities
Natural user interfaces possible Free hand gesture, speech, intelligence interfaces
Important research directions for the future What gesture/speech commands should be used? Relationship better speech and gesture?
More Information
• Mark Billinghurst – Email: [email protected]
– Twitter: @marknb00
• Website – http://www.hitlabnz.org/