design and implementation of the note-taking style haptic voice

Design and Implementation of the Note‐taking Style Haptic Voice Recognition for Mobile Devices

Seungwhan MoonFranklin W. Olin College of Engineering

1000 Olin WayNeedham, MA, [email protected]

Khe Chai SimNational University of SingaporeComputing 1, 13 Computing DriveSingapore, Singapore 117417

[email protected]

1

14th ACM International Conference on Multimodal InteractionDoubleTree Suites Santa Monica, California.

October 22‐26th, 2012

Introduction

• Haptic Voice Recognition (HVR)

Haptic Input

Speech Input

2

Introduction

• Haptic Voice Recognition (HVR)

• Boundary of Sentence (BoS)• Boundary of Word (BoW)• First Letter of Word (FLoW)

…

• Synchronous• Asynchronous

3

Note‐taking Style HVR

4

Motivation

Lecture Note

Haptic voice recognition- combine speech / touch- increases accuracy

Semantically Meaningful Keywords Natural to write & take notes


meeting tom at 6 pm

Haptic Note Sequence

Haptic Input

Speech Input

5


1. An element in a haptic note sequence refers to a partially or fully spelled word in the decoded word sequence.

2. The number and the order of keywords in a haptic note sequence do not need to match those of words in the actual word sequence.

3. The exact time at which a haptic event occurs is ignored.

6


3 Types of Haptic Input Methods

1. Longhand Handwriting

2. Shorthand Handwriting

3. Virtual Keyboard

7

n o t e

N O T E


(Adapted) Gregg Shorthand Handwriting Recognition

1. Facilitates much faster and more effective input

2. Adds ambiguousness to the letters that have phonetic similarities

3. Adapted to HVR – uses isolated letters to spell a word.

8


9

Demo

10

Algorithm Design

11

: Word sequence

: PLI sequence

: Sequence of observed acoustic features

: Sequence of observed haptic features

Haptic Voice Recognition Finding the joint optimal solution for W, L given O, H.

Algorithm Design

12

: Lattice of multiple word sequence hypotheses

: PLI model

: Lattice of permutations of haptic note sequence

Shortest Path of Eq (2) Optimal solution for Eq (1)

Weighted Finite State Transducer (WFST)

Algorithm Design

13

fstcompose

Using OpenFST …

fstshortestpath

fstcompile

Experimental Results(1) Simulation

‐ Single user, 72 sentences, 100 iterations.

‐ N words (partially / fully spelled) are randomly chosen (artificial haptic events)– NW3L / NW

‐ Under two Sound Noise Ratio (SNR) conditions– clean, 15dB (artificially corrupted)

‐ Compared with FLoW, Oracle Error Rate

14

Experimental Results

Figure: Simulation results (a) when performed without any additional noise and (b) when performed with artificial noise at SNR = 15dB. x‐axis denotes the number of randomly chosen keywords (N), whereas y‐axis denotes the word error rate (WER). The red and the blue lines refer to the Note‐taking‐style HVR performance with the first 3 letters of N randomly chosen words (N‐W3L), and the Note‐taking‐style HVR performance with N fully‐spelled words (N‐W). The error bars indicate the standard deviations of the 100 iterations.

(1) Simulation

15

Experimental Results(1) Simulation

16

‐ Notable improvement in the Word Error Rate (WER) for both NW3L & NW in both SNR conditions.

‐ Higher improvement for bigger N – with decreasing rate of improvement

‐ Bottleneck at the Oracle Error Rate performance depends on the quality of the speech

recognizer.

‐ Large standard deviation of WER choice of keywords significantly affect the performance.

Experimental Results(2) Preliminary User Studies

‐ Single User (72 sentences for each)

‐ 3 keywords (partially spelled – only the first 3 characters) are chosen– 3W3L

‐ 3 Different Input Method– Shorthand / Longhand / Keyboard

‐ Compared with BoS, and FLoW

17


Table: Five haptic methods were applied in this experiment: Boundary of Sentences (BoS), 3 Words and 3 Letters (3W3L) via Shorthand, Longhand, and Keyboard input, and First Letter of Words (FLoW). The table reports the Word Error Rate (WER), the Keyword Error Rate (KER), and the absolute improvement in the error rate from the Automatic Speech Recognition (ASR) results to the Hatpic Voice Recognition (HVR) results.

18


‐ Notable improvement in the Word Error Rate (WER) and the Keyword Error Rate (KER)

‐ Greater improvement in KER can enhance the user experience with the speech recognition system.

‐ Increased duration of speech.minimized by the use of partially spelled words and Gregg shorthand.

19

Conclusion

• Summary– Improvement in WER & particularly in KER– Less‐increased duration of speech

(Gregg Shorthand, partial spelling)– Large standard deviations of WER

• Future Work– HVR API– Application in Spoken Document Retrieval

(for Online Lectures, e‐Learning, Conferences, etc.)

20

Design and Implementation of the Note‐taking Style Haptic Voice Recognition for Mobile Devices

Seungwhan MoonFranklin W. Olin College of Engineering

1000 Olin WayNeedham, MA, [email protected]

Khe Chai SimNational University of SingaporeComputing 1, 13 Computing DriveSingapore, Singapore 117417

[email protected]

‐ The End ‐

21

References[1] C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, and M. Mohri. Openfst: A general and ecient weighted finite-state transducer library. Lecture Notes in Computer Science,

4783(11):11{23, 2007.

[2] H. Butler. Teeline Shorthand. Butterworth Heinemann, 1991.

[3] J. R. Gregg. The Basic Principles of Gregg Shorthand. New York: Gregg Pub, 1923.

[4] S. Gunter and H. Bunke. Hmm-based handwritten word recognition: on the optimization of the number of states, training iterations and gaussian components. Pattern

Recognition, 37:2069{2079, 2004.

[5] J. Hu, S. G. Lim, and M. K. Brown. Writer independent on-line handwriting recognition using an hmm approach. Pattern Recognition, 33(1):133 - 147, 2000.

[6] M. Mohri, F. Pereira, and M. Riley. Weighted finite-state transducers in speech recognition. Computer Speech and Language, 16(1):69 - 88, 2002.

[7] G. A. Reid, E. J. Thompson, and M. Angus. Pitman Shorterhand. New York: Pitman Pub, 1972.

[8] T. Robinson, J. Fransen, D. Pye, J. Foote, S. Renals, P. Woodland, and S. Young. WSJCAM0 Cambridge Read News. Linguistic Data Consortium, Philadelphia, 1995.

[9] K. C. Sim. Haptic voice recognition: Augmenting speech modality with touch events for efficient speech recognition. IEEE Spoken Language Technology

Workshop (SLT), pages 73{78, 2010.

[10] K. C. Sim. Probabilistic integration of partial lexical information for noise robust haptic voice recognition. In Proceedings of the 50th Annual Meeting of the Association

for Computational Linguistics, pages 31- 39, July 2012.

[11] A. Varga and H. J. Steeneken. Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech

recognition systems. Speech Communication, 12(3):247 - 251, 1993.

[12] S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland. The HTK Book (for HTK version 3.4). Cambridge University, December 2006.

22

design and implementation of the note-taking style haptic voice

Documents