design and implementation of the note-taking style haptic voice

22
Design and Implementation of the Notetaking Style Haptic Voice Recognition for Mobile Devices Seungwhan Moon Franklin W. Olin College of Engineering 1000 Olin Way Needham, MA, U.S.A. [email protected] Khe Chai Sim National University of Singapore Computing 1, 13 Computing Drive Singapore, Singapore 117417 [email protected] 1 14th ACM International Conference on Multimodal Interaction DoubleTree Suites Santa Monica, California. October 2226th, 2012

Upload: dangnhi

Post on 03-Jan-2017

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Design and Implementation of the Note-taking Style Haptic Voice

Design and Implementation of the Note‐taking Style Haptic Voice Recognition for Mobile Devices

Seungwhan MoonFranklin W. Olin College of Engineering

1000 Olin WayNeedham, MA, [email protected]

Khe Chai SimNational University of SingaporeComputing 1, 13 Computing DriveSingapore, Singapore 117417

[email protected]

1

14th ACM International Conference on Multimodal InteractionDoubleTree Suites Santa Monica, California.

October 22‐26th, 2012

Page 2: Design and Implementation of the Note-taking Style Haptic Voice

Introduction

• Haptic Voice Recognition (HVR)

Haptic Input

Speech Input

2

Page 3: Design and Implementation of the Note-taking Style Haptic Voice

Introduction

• Haptic Voice Recognition (HVR)

• Boundary of Sentence (BoS)• Boundary of Word (BoW)• First Letter of Word (FLoW)

… 

• Synchronous• Asynchronous

3

Page 4: Design and Implementation of the Note-taking Style Haptic Voice

Note‐taking Style HVR

4

Motivation

Lecture Note

Haptic voice recognition- combine speech / touch- increases accuracy

Semantically Meaningful Keywords Natural to write & take notes

Page 5: Design and Implementation of the Note-taking Style Haptic Voice

Note‐taking Style HVR

meeting tom at 6 pm

Haptic Note Sequence

Haptic Input

Speech Input

5

Page 6: Design and Implementation of the Note-taking Style Haptic Voice

Note‐taking Style HVR

1. An element in a haptic note sequence refers to a partially or fully spelled word in the decoded word sequence.

2. The number and the order of keywords in a haptic note sequence do not need to match those of words in the actual word sequence.

3. The exact time at which a haptic event occurs is ignored.

6

Page 7: Design and Implementation of the Note-taking Style Haptic Voice

Note‐taking Style HVR

3 Types of Haptic Input Methods

1. Longhand Handwriting

2. Shorthand Handwriting

3. Virtual Keyboard

7

n o t e

N O T E

Page 8: Design and Implementation of the Note-taking Style Haptic Voice

Note‐taking Style HVR

(Adapted) Gregg Shorthand Handwriting Recognition

1. Facilitates much faster and more effective input

2. Adds ambiguousness to the letters that have phonetic similarities

3. Adapted to HVR – uses isolated letters to spell a word.

8

Page 9: Design and Implementation of the Note-taking Style Haptic Voice

Note‐taking Style HVR

9

Page 10: Design and Implementation of the Note-taking Style Haptic Voice

Demo

10

Page 11: Design and Implementation of the Note-taking Style Haptic Voice

Algorithm Design

11

: Word sequence

: PLI sequence

: Sequence of observed acoustic features

: Sequence of observed haptic features

Haptic Voice Recognition  Finding the joint optimal solution for W, L given O, H.

Page 12: Design and Implementation of the Note-taking Style Haptic Voice

Algorithm Design

12

: Lattice of multiple word sequence hypotheses

: PLI model

: Lattice of permutations of haptic note sequence

Shortest Path of Eq (2)  Optimal solution for Eq (1)

Weighted Finite State Transducer (WFST)

Page 13: Design and Implementation of the Note-taking Style Haptic Voice

Algorithm Design

13

fstcompose

Using OpenFST …

fstshortestpath

fstcompile

Page 14: Design and Implementation of the Note-taking Style Haptic Voice

Experimental Results(1) Simulation

‐ Single user, 72 sentences, 100 iterations.

‐ N words (partially / fully spelled) are randomly chosen (artificial haptic events)– NW3L   /   NW

‐ Under two Sound Noise Ratio (SNR) conditions– clean, 15dB (artificially corrupted)

‐ Compared with FLoW, Oracle Error Rate

14

Page 15: Design and Implementation of the Note-taking Style Haptic Voice

Experimental Results

Figure: Simulation results (a) when performed without any additional noise and (b) when performed with artificial noise at SNR = 15dB. x‐axis denotes the number of randomly chosen keywords (N), whereas y‐axis denotes the word error rate (WER). The red and the blue lines refer to the Note‐taking‐style HVR performance with the first 3 letters of N randomly chosen words (N‐W3L), and the Note‐taking‐style HVR performance with N fully‐spelled words (N‐W). The error bars indicate the standard deviations of the 100 iterations. 

(1) Simulation

15

Page 16: Design and Implementation of the Note-taking Style Haptic Voice

Experimental Results(1) Simulation

16

‐ Notable improvement in the Word Error Rate (WER) for both  NW3L &  NW in both SNR conditions.

‐ Higher improvement for bigger N – with decreasing rate of improvement

‐ Bottleneck at the Oracle Error Rate performance depends on the quality of the speech               

recognizer.

‐ Large standard deviation of WER choice of keywords  significantly affect the performance.

Page 17: Design and Implementation of the Note-taking Style Haptic Voice

Experimental Results(2) Preliminary User Studies

‐ Single User (72 sentences for each)

‐ 3 keywords (partially spelled – only the first 3 characters) are chosen– 3W3L 

‐ 3 Different Input Method– Shorthand  /  Longhand  / Keyboard

‐ Compared with BoS, and FLoW

17

Page 18: Design and Implementation of the Note-taking Style Haptic Voice

Experimental Results(2) Preliminary User Studies

Table: Five haptic methods were applied in this experiment: Boundary of Sentences (BoS), 3 Words and 3 Letters (3W3L) via Shorthand, Longhand, and Keyboard input, and First Letter of Words (FLoW). The table reports the Word Error Rate (WER), the Keyword Error Rate (KER), and the absolute improvement in the error rate from the Automatic Speech Recognition (ASR) results to the Hatpic Voice Recognition (HVR) results.

18

Page 19: Design and Implementation of the Note-taking Style Haptic Voice

Experimental Results(2) Preliminary User Studies

‐ Notable improvement in the Word Error Rate (WER) and the Keyword Error Rate (KER)

‐ Greater improvement in KER can enhance the user experience with the speech recognition system.

‐ Increased duration of speech.minimized by the use of partially spelled words and Gregg shorthand.

19

Page 20: Design and Implementation of the Note-taking Style Haptic Voice

Conclusion

• Summary– Improvement in WER & particularly in KER– Less‐increased duration of speech

(Gregg Shorthand, partial spelling)– Large standard deviations of WER

• Future Work– HVR API– Application in Spoken Document Retrieval

(for Online Lectures, e‐Learning, Conferences, etc.)

20

Page 21: Design and Implementation of the Note-taking Style Haptic Voice

Design and Implementation of the Note‐taking Style Haptic Voice Recognition for Mobile Devices

Seungwhan MoonFranklin W. Olin College of Engineering

1000 Olin WayNeedham, MA, [email protected]

Khe Chai SimNational University of SingaporeComputing 1, 13 Computing DriveSingapore, Singapore 117417

[email protected]

‐ The End ‐

21

Page 22: Design and Implementation of the Note-taking Style Haptic Voice

References[1] C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, and M. Mohri. Openfst: A general and ecient weighted finite-state transducer library. Lecture Notes in Computer Science,

4783(11):11{23, 2007.

[2] H. Butler. Teeline Shorthand. Butterworth Heinemann, 1991.

[3] J. R. Gregg. The Basic Principles of Gregg Shorthand. New York: Gregg Pub, 1923.

[4] S. Gunter and H. Bunke. Hmm-based handwritten word recognition: on the optimization of the number of states, training iterations and gaussian components. Pattern

Recognition, 37:2069{2079, 2004.

[5] J. Hu, S. G. Lim, and M. K. Brown. Writer independent on-line handwriting recognition using an hmm approach. Pattern Recognition, 33(1):133 - 147, 2000.

[6] M. Mohri, F. Pereira, and M. Riley. Weighted finite-state transducers in speech recognition. Computer Speech and Language, 16(1):69 - 88, 2002.

[7] G. A. Reid, E. J. Thompson, and M. Angus. Pitman Shorterhand. New York: Pitman Pub, 1972.

[8] T. Robinson, J. Fransen, D. Pye, J. Foote, S. Renals, P. Woodland, and S. Young. WSJCAM0 Cambridge Read News. Linguistic Data Consortium, Philadelphia, 1995.

[9] K. C. Sim. Haptic voice recognition: Augmenting speech modality with touch events for efficient speech recognition. IEEE Spoken Language Technology

Workshop (SLT), pages 73{78, 2010.

[10] K. C. Sim. Probabilistic integration of partial lexical information for noise robust haptic voice recognition. In Proceedings of the 50th Annual Meeting of the Association

for Computational Linguistics, pages 31- 39, July 2012.

[11] A. Varga and H. J. Steeneken. Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech

recognition systems. Speech Communication, 12(3):247 - 251, 1993.

[12] S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland. The HTK Book (for HTK version 3.4). Cambridge University, December 2006.

22