speeg

SpeeGA Mul&modal Speech-‐ and

Gesture-‐based Text Input Solu&on

Lode Hoste, Bruno Dumas and Beat Signer

SpeeG - Lode HosteVrije Universiteit Brussel 2

Text-input for set-top boxes


Text-input for set-top boxes

SpeeG - Lode HosteVrije Universiteit Brussel

Dasher

8PenSwiftKey

Speech Dasher SpeeG

EdgeWriter

1D Keyboard for Kinect Virtual Keyboard for XboxChatpad Controller

6


Virtual keyboard

7


Kinect 1D keyboard

8


Kinect 1D keyboard

9


Dasher

8PenSwiftKey

Speech Dasher SpeeG

EdgeWriter


10


Dasher

8PenSwiftKey

Speech Dasher SpeeG

EdgeWriter


11


Dasher

12

Continuous inputJoystick / Gaze / ...Open vocabularyAllows imprecise navigation


Dasher

13


Controller-freeText inputWithout training

14

KinectCMU SphinxDasher

Used technologies:Goals:


SpeeG

15


SpeeG Architecture

User

1

GUI (JDasher)

Speech Recogniser(CMU Sphinx 4)

Hand Tracking(Microsoft Kinect and NITE)

5

42

3

17


Evaluation

18

SpeeGUser

1

GUI (JDasher)



5

42

3Speech-only

Virtual Keyboard Kinect Keyboard


Evaluation

“this was easy for us”“he will allow a rare lie”“did you eat yet”

“my watch fell in the water”“the world is a stage”“peek out the window”

19

7 (male) users: 23-31y

1-3: DARPA’s TIMIT

Performed a quantitative (Words per minute and nr of errors) and qualitative (feedback and preference) evaluation

4-6: MacKenzie and Soukoreff

show 2 about ‘expertise of users’


0

1

2

3

4

5

6

7

8

9

10

S1 S2 S3 S4 S5 S6

WPM

Sentence

User 1

User 2

User 3

User 4

User 5

User 6

User 7

Virtual keyboard

20

6.3 WPM


0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

S1 S2 S3 S4 S5 S6

WPM

Sentence

User 1

User 2

User 3

User 4

User 5

User 6

User 7

Kinect Keyboard

21

*

1.83 WPM


0

5

10

15

20

25

30

35

40

S1 S2 S3 S4 S5 S6

WPM

Sentence

User 1

User 2

User 3

User 4

User 5

User 6

User 7

Speech-only

22

User

1

GUI (JDasher)



5

42

3

11 WPM


0

1

2

3

4

5

6

7

8

9

10

S1 S2 S3 S4 S5 S6

WPM

Sentence

User 2

User 1

User 3

User 4

User 5

User 6

User 7

SpeeG

23

5.8 WPM


0

1

2

3

4

5

6

7

8

9

10

S1 S2 S3 S4 S5 S6

WPM

Sentence

User 2

User 1

User 3

User 4

User 5

User 6

User 7

SpeeG

24

2.6 7.8 WPM


0

5

10

15

20

25

S1 S2 S3 S4 S5 S6

WPM

Sentence

Controller

Speech only

Kinect only

SpeeG

Mean WPM per sentenceand input device

25

SpeeG

1D Keyboard for XboxVirtual Keyboard for Xbox

Speech-onlyUser

1

GUI (JDasher)



5

42

3


0

1

2

3

4

5

6

7

8

9

10

S1 S2 S3 S4 S5 S6

Mea

n nu

mbe

r of e

rror

s

Sentence

Controller Speech only Kinect only SpeeG

SpeeG

1D Keyboard for XboxVirtual Keyboard for Xbox

Speech-onlyUser

1

GUI (JDasher)



5

42

3

Errors per sentenceand input device


Future work

28

Other visualisations Smaller gesturesDedicated commands (gesture / voice)


Kinect

- Controller-free text input- Real-time correction- Dasher, zoomable interface - probabilities - alphabetic order - character-level

SpeeGA Mul&modal Speech-‐ and

Gesture-‐ based Text Input Solu&on Lode Hoste, Bruno Dumas, Beat Signer

Speech

- Non-native speakers- Untrained voice recogniser- 6-12 WPM- Perceived fastest- Game-like character- Novice and experts

30Special thanks to Jorn De Baerdenmaeker and Keith Vertaenen

speeg - a multimodal speech- and gesture-based text input solution

Science

1d keyboard

mulamp

bruno dumas

cmu sphinx

microsoft

modal speech

text