speeg - a multimodal speech- and gesture-based text input solution

30
SpeeG A Mul&modal Speech and Gesturebased Text Input Solu&on Lode Hoste, Bruno Dumas and Beat Signer

Upload: beat-signer

Post on 07-Nov-2014

1.553 views

Category:

Science


5 download

DESCRIPTION

Presentation given at AVI 2012, International Working Conference on Advanced Visual Interfaces, Capri Island, Italy, May 2012 ABSTRACT: We present SpeeG, a multimodal speech- and body gesture-basedtext input system targeting media centres, set-top boxes and game consoles. Our controller-free zoomable user interface combines speech input with a gesture-based real-time correction of the recognised voice input. While the open source CMU Sphinx voice recogniser transforms speech input into written text, Microsoft’s Kinect sensor is used for the hand gesture tracking. A modified version of the zoomable Dasher interface combines the input from Sphinx and the Kinect sensor. In contrast to existing speech error correction solutions with a clear distinction between a detection and correction phase, our innovative SpeeG text input system enables continuous real-time error correction. An evaluation of the SpeeG prototype has revealed that low error rates for a text input speed of about six words per minute can be achieved after a minimal learning phase. Moreover, in a user study SpeeG has been perceived as the fastest of all evaluated user interfaces and therefore represents a promising candidate for future controller-free text input. Paper: http://vub.academia.edu/BeatSigner/Papers/1484787/SpeeG_A_Multimodal_Speech-_and_Gesture-based_Text_Input_Solution

TRANSCRIPT

Page 1: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeGA  Mul&modal  Speech-­‐  and  

Gesture-­‐based  Text  Input  Solu&on

Lode  Hoste,  Bruno  Dumas  and  Beat  Signer

Page 2: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel 2

Text-input for set-top boxes

Page 3: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel 3

Page 4: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel 4

Page 5: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel 5

Text-input for set-top boxes

Page 6: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Dasher

8PenSwiftKey

Speech Dasher SpeeG

EdgeWriter

1D Keyboard for Kinect Virtual Keyboard for XboxChatpad Controller

6

Page 7: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Virtual keyboard

7

Page 8: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Kinect 1D keyboard

8

Page 9: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Kinect 1D keyboard

9

Page 10: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Dasher

8PenSwiftKey

Speech Dasher SpeeG

EdgeWriter

1D Keyboard for Kinect Virtual Keyboard for XboxChatpad Controller

10

Page 11: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Dasher

8PenSwiftKey

Speech Dasher SpeeG

EdgeWriter

1D Keyboard for Kinect Virtual Keyboard for XboxChatpad Controller

11

Page 12: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Dasher

12

Continuous inputJoystick / Gaze / ...Open vocabularyAllows imprecise navigation

Page 13: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Dasher

13

Page 14: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Controller-freeText inputWithout training

14

KinectCMU SphinxDasher

Used technologies:Goals:

Page 15: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

SpeeG

15

Page 16: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel 16

Page 17: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

SpeeG Architecture

User

1

GUI (JDasher)

Speech Recogniser(CMU Sphinx 4)

Hand Tracking(Microsoft Kinect and NITE)

5

42

3

17

Page 18: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Evaluation

18

SpeeGUser

1

GUI (JDasher)

Speech Recogniser(CMU Sphinx 4)

Hand Tracking(Microsoft Kinect and NITE)

5

42

3Speech-only

Virtual Keyboard Kinect Keyboard

Page 19: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Evaluation

“this was easy for us”“he will allow a rare lie”“did you eat yet”

“my watch fell in the water”“the world is a stage”“peek out the window”

19

7 (male) users: 23-31y

1-3: DARPA’s TIMIT

Performed a quantitative (Words per minute and nr of errors) and qualitative (feedback and preference) evaluation

4-6: MacKenzie and Soukoreff

show 2 about ‘expertise of users’

Page 20: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

0

1

2

3

4

5

6

7

8

9

10

S1 S2 S3 S4 S5 S6

WPM

Sentence

User 1

User 2

User 3

User 4

User 5

User 6

User 7

Virtual keyboard

20

6.3 WPM

Page 21: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

S1 S2 S3 S4 S5 S6

WPM

Sentence

User 1

User 2

User 3

User 4

User 5

User 6

User 7

Kinect Keyboard

21

*

1.83 WPM

Page 22: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

0

5

10

15

20

25

30

35

40

S1 S2 S3 S4 S5 S6

WPM

Sentence

User 1

User 2

User 3

User 4

User 5

User 6

User 7

Speech-only

22

User

1

GUI (JDasher)

Speech Recogniser(CMU Sphinx 4)

Hand Tracking(Microsoft Kinect and NITE)

5

42

3

11 WPM

Page 23: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

0

1

2

3

4

5

6

7

8

9

10

S1 S2 S3 S4 S5 S6

WPM

Sentence

User 2

User 1

User 3

User 4

User 5

User 6

User 7

SpeeG

23

5.8 WPM

Page 24: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

0

1

2

3

4

5

6

7

8

9

10

S1 S2 S3 S4 S5 S6

WPM

Sentence

User 2

User 1

User 3

User 4

User 5

User 6

User 7

SpeeG

24

2.6 7.8 WPM

Page 25: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

0

5

10

15

20

25

S1 S2 S3 S4 S5 S6

WPM

Sentence

Controller

Speech only

Kinect only

SpeeG

Mean WPM per sentenceand input device

25

SpeeG

1D Keyboard for XboxVirtual Keyboard for Xbox

Speech-onlyUser

1

GUI (JDasher)

Speech Recogniser(CMU Sphinx 4)

Hand Tracking(Microsoft Kinect and NITE)

5

42

3

Page 26: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel 26

0

1

2

3

4

5

6

7

8

9

10

S1 S2 S3 S4 S5 S6

Mea

n nu

mbe

r of e

rror

s

Sentence

Controller Speech only Kinect only SpeeG

SpeeG

1D Keyboard for XboxVirtual Keyboard for Xbox

Speech-onlyUser

1

GUI (JDasher)

Speech Recogniser(CMU Sphinx 4)

Hand Tracking(Microsoft Kinect and NITE)

5

42

3

Errors per sentenceand input device

Page 27: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel 27

Page 28: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Future work

28

Other visualisations Smaller gesturesDedicated commands (gesture / voice)

Page 29: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel 29

Page 30: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Kinect

- Controller-free text input- Real-time correction- Dasher, zoomable interface - probabilities - alphabetic order - character-level

SpeeGA  Mul&modal  Speech-­‐  and  

Gesture-­‐  based  Text  Input  Solu&on Lode  Hoste,  Bruno  Dumas,  Beat  Signer

Speech

- Non-native speakers- Untrained voice recogniser- 6-12 WPM- Perceived fastest- Game-like character- Novice and experts

30Special thanks to Jorn De Baerdenmaeker and Keith Vertaenen