speeg - a multimodal speech- and gesture-based text input solution
DESCRIPTION
Presentation given at AVI 2012, International Working Conference on Advanced Visual Interfaces, Capri Island, Italy, May 2012 ABSTRACT: We present SpeeG, a multimodal speech- and body gesture-basedtext input system targeting media centres, set-top boxes and game consoles. Our controller-free zoomable user interface combines speech input with a gesture-based real-time correction of the recognised voice input. While the open source CMU Sphinx voice recogniser transforms speech input into written text, Microsoft’s Kinect sensor is used for the hand gesture tracking. A modified version of the zoomable Dasher interface combines the input from Sphinx and the Kinect sensor. In contrast to existing speech error correction solutions with a clear distinction between a detection and correction phase, our innovative SpeeG text input system enables continuous real-time error correction. An evaluation of the SpeeG prototype has revealed that low error rates for a text input speed of about six words per minute can be achieved after a minimal learning phase. Moreover, in a user study SpeeG has been perceived as the fastest of all evaluated user interfaces and therefore represents a promising candidate for future controller-free text input. Paper: http://vub.academia.edu/BeatSigner/Papers/1484787/SpeeG_A_Multimodal_Speech-_and_Gesture-based_Text_Input_SolutionTRANSCRIPT
SpeeGA Mul&modal Speech-‐ and
Gesture-‐based Text Input Solu&on
Lode Hoste, Bruno Dumas and Beat Signer
SpeeG - Lode HosteVrije Universiteit Brussel 2
Text-input for set-top boxes
SpeeG - Lode HosteVrije Universiteit Brussel 3
SpeeG - Lode HosteVrije Universiteit Brussel 4
SpeeG - Lode HosteVrije Universiteit Brussel 5
Text-input for set-top boxes
SpeeG - Lode HosteVrije Universiteit Brussel
Dasher
8PenSwiftKey
Speech Dasher SpeeG
EdgeWriter
1D Keyboard for Kinect Virtual Keyboard for XboxChatpad Controller
6
SpeeG - Lode HosteVrije Universiteit Brussel
Virtual keyboard
7
SpeeG - Lode HosteVrije Universiteit Brussel
Kinect 1D keyboard
8
SpeeG - Lode HosteVrije Universiteit Brussel
Kinect 1D keyboard
9
SpeeG - Lode HosteVrije Universiteit Brussel
Dasher
8PenSwiftKey
Speech Dasher SpeeG
EdgeWriter
1D Keyboard for Kinect Virtual Keyboard for XboxChatpad Controller
10
SpeeG - Lode HosteVrije Universiteit Brussel
Dasher
8PenSwiftKey
Speech Dasher SpeeG
EdgeWriter
1D Keyboard for Kinect Virtual Keyboard for XboxChatpad Controller
11
SpeeG - Lode HosteVrije Universiteit Brussel
Dasher
12
Continuous inputJoystick / Gaze / ...Open vocabularyAllows imprecise navigation
SpeeG - Lode HosteVrije Universiteit Brussel
Dasher
13
SpeeG - Lode HosteVrije Universiteit Brussel
Controller-freeText inputWithout training
14
KinectCMU SphinxDasher
Used technologies:Goals:
SpeeG - Lode HosteVrije Universiteit Brussel
SpeeG
15
SpeeG - Lode HosteVrije Universiteit Brussel 16
SpeeG - Lode HosteVrije Universiteit Brussel
SpeeG Architecture
User
1
GUI (JDasher)
Speech Recogniser(CMU Sphinx 4)
Hand Tracking(Microsoft Kinect and NITE)
5
42
3
17
SpeeG - Lode HosteVrije Universiteit Brussel
Evaluation
18
SpeeGUser
1
GUI (JDasher)
Speech Recogniser(CMU Sphinx 4)
Hand Tracking(Microsoft Kinect and NITE)
5
42
3Speech-only
Virtual Keyboard Kinect Keyboard
SpeeG - Lode HosteVrije Universiteit Brussel
Evaluation
“this was easy for us”“he will allow a rare lie”“did you eat yet”
“my watch fell in the water”“the world is a stage”“peek out the window”
19
7 (male) users: 23-31y
1-3: DARPA’s TIMIT
Performed a quantitative (Words per minute and nr of errors) and qualitative (feedback and preference) evaluation
4-6: MacKenzie and Soukoreff
show 2 about ‘expertise of users’
SpeeG - Lode HosteVrije Universiteit Brussel
0
1
2
3
4
5
6
7
8
9
10
S1 S2 S3 S4 S5 S6
WPM
Sentence
User 1
User 2
User 3
User 4
User 5
User 6
User 7
Virtual keyboard
20
6.3 WPM
SpeeG - Lode HosteVrije Universiteit Brussel
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
S1 S2 S3 S4 S5 S6
WPM
Sentence
User 1
User 2
User 3
User 4
User 5
User 6
User 7
Kinect Keyboard
21
*
1.83 WPM
SpeeG - Lode HosteVrije Universiteit Brussel
0
5
10
15
20
25
30
35
40
S1 S2 S3 S4 S5 S6
WPM
Sentence
User 1
User 2
User 3
User 4
User 5
User 6
User 7
Speech-only
22
User
1
GUI (JDasher)
Speech Recogniser(CMU Sphinx 4)
Hand Tracking(Microsoft Kinect and NITE)
5
42
3
11 WPM
SpeeG - Lode HosteVrije Universiteit Brussel
0
1
2
3
4
5
6
7
8
9
10
S1 S2 S3 S4 S5 S6
WPM
Sentence
User 2
User 1
User 3
User 4
User 5
User 6
User 7
SpeeG
23
5.8 WPM
SpeeG - Lode HosteVrije Universiteit Brussel
0
1
2
3
4
5
6
7
8
9
10
S1 S2 S3 S4 S5 S6
WPM
Sentence
User 2
User 1
User 3
User 4
User 5
User 6
User 7
SpeeG
24
2.6 7.8 WPM
SpeeG - Lode HosteVrije Universiteit Brussel
0
5
10
15
20
25
S1 S2 S3 S4 S5 S6
WPM
Sentence
Controller
Speech only
Kinect only
SpeeG
Mean WPM per sentenceand input device
25
SpeeG
1D Keyboard for XboxVirtual Keyboard for Xbox
Speech-onlyUser
1
GUI (JDasher)
Speech Recogniser(CMU Sphinx 4)
Hand Tracking(Microsoft Kinect and NITE)
5
42
3
SpeeG - Lode HosteVrije Universiteit Brussel 26
0
1
2
3
4
5
6
7
8
9
10
S1 S2 S3 S4 S5 S6
Mea
n nu
mbe
r of e
rror
s
Sentence
Controller Speech only Kinect only SpeeG
SpeeG
1D Keyboard for XboxVirtual Keyboard for Xbox
Speech-onlyUser
1
GUI (JDasher)
Speech Recogniser(CMU Sphinx 4)
Hand Tracking(Microsoft Kinect and NITE)
5
42
3
Errors per sentenceand input device
SpeeG - Lode HosteVrije Universiteit Brussel 27
SpeeG - Lode HosteVrije Universiteit Brussel
Future work
28
Other visualisations Smaller gesturesDedicated commands (gesture / voice)
SpeeG - Lode HosteVrije Universiteit Brussel 29
SpeeG - Lode HosteVrije Universiteit Brussel
Kinect
- Controller-free text input- Real-time correction- Dasher, zoomable interface - probabilities - alphabetic order - character-level
SpeeGA Mul&modal Speech-‐ and
Gesture-‐ based Text Input Solu&on Lode Hoste, Bruno Dumas, Beat Signer
Speech
- Non-native speakers- Untrained voice recogniser- 6-12 WPM- Perceived fastest- Game-like character- Novice and experts
30Special thanks to Jorn De Baerdenmaeker and Keith Vertaenen