dialog design speech/natural language pen & gesture

58
Dialog Design Speech/Natural Language Pen & Gesture

Post on 21-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dialog Design Speech/Natural Language Pen & Gesture

Dialog Design

Speech/Natural Language

Pen & Gesture

Page 2: Dialog Design Speech/Natural Language Pen & Gesture

Dialog Styles

1. Command languages

2. WIMP - Window, Icon, Menu, Pointer

3. Direct manipulation

4. Speech/Natural language

5. Gesture, pen

Page 3: Dialog Design Speech/Natural Language Pen & Gesture

Natural input

Universal design Take advantage of familiarity,

existing knowledge Alternative input & output Multi-modal interfaces Getting “off the desktop”

Page 4: Dialog Design Speech/Natural Language Pen & Gesture

Agenda

Speech Natural Language Other audio

PDAs & Pen input styles Gesture

Page 5: Dialog Design Speech/Natural Language Pen & Gesture

When to Use Speech

Hands busy Mobility required Eyes occupied Conditions preclude use of keyboard Visual impairment Physical limitation

Page 6: Dialog Design Speech/Natural Language Pen & Gesture

Waveform & Spectrogram

Speech does not equal written language

Page 7: Dialog Design Speech/Natural Language Pen & Gesture

Parsing Sentences

"I told him to go back where he came from, but he wouldn't listen."

Page 8: Dialog Design Speech/Natural Language Pen & Gesture

Speech Input

Speaker recognition Speech recognition Natural language understanding

Page 9: Dialog Design Speech/Natural Language Pen & Gesture

Speaker Recognition

Tell which person it is (voice print)

Could also be important for monitoring meetings, determining speaker

Page 10: Dialog Design Speech/Natural Language Pen & Gesture

Speech Recognition

Primarily identifying words

Improving all the time Commercial systems:

IBM ViaVoice, Dragon Dictate, ...

Page 11: Dialog Design Speech/Natural Language Pen & Gesture

Recognition Dimensions

Speaker dependent/independent Parametric patterns are sensitive to speaker With training (dependent) can get better

Vocabulary Some have 50,000+ words

Isolated word vs. continuous speech Continuous: where words stop & begin Typically a pattern match, no context used

Did youvs.Didja

Page 12: Dialog Design Speech/Natural Language Pen & Gesture

Recognition Systems

Typical system has 5 components: Speech capture device - Has analog -> digital

converter Digital Signal Processor - Gets word

boundaries, scales, filters, cuts out extra stuff Preprocessed signal storage - Processed

speech buffered for recognition algorithm Reference speech patterns - Stored templates

or generative speech models for comparisons Pattern matching algorithm - Goodness of fit

from templates/model to user’s speech

Page 13: Dialog Design Speech/Natural Language Pen & Gesture

Errors

Systems make four types of errors:Substitution - one for anotherRejection - detected, but not

recognizedInsertion - addedDeletion - not detected

Problems with recovery

Page 14: Dialog Design Speech/Natural Language Pen & Gesture

Recognition Example:

My flight experience uh, I started off as a private pilot and then I spent 20 years flying as a navigator on C130s in the air force and I currently fly as a first officer on the SAAB 340 turboprop.

my flight experience a and I started off as a private pilot and I spent 20 years flying is a navigator on C-130s in the Air Force and I currently fly as a first officer on the sound 340 turboprop.

Page 15: Dialog Design Speech/Natural Language Pen & Gesture

And another example

Here here's one of the

thoughts I'd throw in on

that. I'd agree with

Howard that the

combination of checklists

and flows is the way way

to manage things best.

hair and here's one of the

outside troodon and

an eye to agree with

power that the

domination checklists

and flows is the way

that managed things best

Page 16: Dialog Design Speech/Natural Language Pen & Gesture

Natural Language Understanding

Putting meaning to the words

Input might be spoken or typed

Holy grail of Artificial Intelligence problems

Page 17: Dialog Design Speech/Natural Language Pen & Gesture

Eliza - a “Faked” Understanding Person : Men are all alike.

ELIZA : In what way?Person : They're always bugging us about something or other.ELIZA : Can you think of a specific example?Person : Well, my boyfriend made me come here.ELIZA : Your boyfriend made you come here?Person : He says I'm depressed much of the time.ELIZA : I am sorry to hear that you are depressed.Person : It's true. I am unhappy.ELIZA : Do you think coming here will help you not to be unhappy?Person : I need some help, that much seems certain.ELIZA : What would it mean to you if you got some help?Person : Perhaps I could learn to get along with my mother.ELIZA : Tell me more about your family

http://www-ai.ijs.si/eliza/eliza.html Weizenbaum, J., "ELIZA -- A computer program for the study of

natural language communication between man and machine", Communications of the ACM 9(1):36-45, 1966

Page 18: Dialog Design Speech/Natural Language Pen & Gesture

Natural Language Domains

Conceptual: total set of objects/actions• Knows about managers and salaries

Functional: what can be expressed• What is the salary of Joe’s manager? Who is Mary’s

manager? Syntactic: variety of forms

• What is the salary of the manager of Joe? Lexical: word meanings, synonyms,

vocabulary• What were the earnings of Joe’s boss

Page 19: Dialog Design Speech/Natural Language Pen & Gesture

NL Factors/Terms

SyntacticGrammar or structure

ProsodicInflection, stress, pitch, timing

PragmaticSituated context of utterance, location,

time Semantic

Meaning of words

Page 20: Dialog Design Speech/Natural Language Pen & Gesture

SR/NLU Advantages

Easy to learn and remember Powerful Fast, efficient (not always) Little screen real estate

Page 21: Dialog Design Speech/Natural Language Pen & Gesture

SR/NLU Disadvantages

Assumes domain knowledge Doesn’t work well enough yet

Requires confirmationAnd recognition will always be error-

prone Expensive to implement Unrealistic expectations Generate mistrust/anger

Page 22: Dialog Design Speech/Natural Language Pen & Gesture

Speech Output

Tradeoffs in speed, naturalness and understandability

Male or female voice? Technical issues (freq. response of phone) User preference (depends on the application)

Rate of speech Technically up to 550 wpm! Depends on listener

Synthesized or Pre-recorded? Synthesized: Better coverage, flexibility Recorded: Better quality, acceptance

Page 23: Dialog Design Speech/Natural Language Pen & Gesture

Speech Output

Synthesis Quality depends on software ($$) Influence of vocabulary and phrase choices http://www.research.att.com/projects/tts/demo.html

Recorded segments Store tones, then put them together The transitions are difficult (e.g., numbers)

Page 24: Dialog Design Speech/Natural Language Pen & Gesture

Designing the Interaction

Constrain vocabularyLimit valid commandsStructure questions wisely (Yes/No)Manage the interactionExamples?

Slow speech rate, but concise phrases Design for failsafe error recovery Visual record of input/output Design for the user – Wizard of Oz

Page 25: Dialog Design Speech/Natural Language Pen & Gesture

Speech Tools/Toolkits

Java Speech SDK FreeTTS 1.1.1

http://freetts.sourceforge.net/docs/index.php

IBM JavaBeans for speech Microsoft speech SDK (Visual Basic, etc.) OS capabilities (speech recognition and

synthesis built in to OS) (TextEdit) VoiceXML

Page 26: Dialog Design Speech/Natural Language Pen & Gesture

General Issues in Choosing Dialogue Style Who is in control - user or computer Initial training required Learning time to become proficient Speed of use Generality/flexibility/power Special skills - typing Gulf of evaluation / gulf of execution Screen space required Computational resources required

Page 27: Dialog Design Speech/Natural Language Pen & Gesture

Non-speech audio

Traditionally used for warnings, alarms or status information

Sounds provide information that help reduce error. Eg: typing, video games

Multi-modal interfaces

Page 28: Dialog Design Speech/Natural Language Pen & Gesture

Additional benefits of non-speech audio

Good for indicating changes, since we ignore continuous sounds

Provides secondary representationSupports visual interface

Tradeoff in using natural (real) sounds vs. synthesized noises.

Page 29: Dialog Design Speech/Natural Language Pen & Gesture

Non-speech audio examples

Error ding Info beep Email arriving ding Recycle Battery critical Logoff LogonOthers?

Page 30: Dialog Design Speech/Natural Language Pen & Gesture

Pen and Gesture

Page 31: Dialog Design Speech/Natural Language Pen & Gesture

PDAs

Becoming more common and widely used

Smaller display (160x160), (320x240) Few buttons, interact through pen Estimate: 14 million shipped by 2004 Improvements

Wireless, color, more memory, better CPU, better OS

Palmtop versus Handheld

Page 32: Dialog Design Speech/Natural Language Pen & Gesture

No Shredder…

Page 33: Dialog Design Speech/Natural Language Pen & Gesture

http://sonyelectronics.sonystyle.com/micros/clie/

http://www.vaio.sony.co.jp/Products/VGN-U50/

http://www.oqo.com/

http://www.blackberry.com/

Page 34: Dialog Design Speech/Natural Language Pen & Gesture

Input

Pen is dominant form for PDA and Tablet Main techniques

Free-form ink Soft keyboard Numeric keyboard => text Stroke recognition - strokes not in the shape

of characters Hand printing / writing recognition

Sometimes can connect keyboard or has modified keyboard (e.g. cell phones)

Page 35: Dialog Design Speech/Natural Language Pen & Gesture

Soft Keyboards

Common on PDAs and mobile devicesTap on buttons on screen

Page 36: Dialog Design Speech/Natural Language Pen & Gesture

Soft Keyboard

Presents a small diagram of keyboard You click on buttons/keys with pen QWERTY vs. alphabetical

Tradeoffs?Alternatives?

Page 37: Dialog Design Speech/Natural Language Pen & Gesture

Numeric Keypad -T9

Tegic Communications developed You press out letters of your word, it matches the

most likely word, then gives optional choices Faster than multiple presses per key Used in mobile phones http://www.t9.com/

Page 38: Dialog Design Speech/Natural Language Pen & Gesture

Cirrin - Stroke Recognition

Developed by Jen Mankoff (GT -> Berkeley CS Faculty -> CMU CS Faculty)

Word-level unistroke technique UIST ‘98 paper Use stylus to go

from one letterto the next ->

Page 39: Dialog Design Speech/Natural Language Pen & Gesture

Quikwriting - Stroke Recogntion

Developed by Ken Perlin

Page 40: Dialog Design Speech/Natural Language Pen & Gesture

Quikwriting Example

p l

http://mrl.nyu.edu/~perlin/demos/Quikwrite2_0.html

e

Said to be as fast as graffiti, but have to learn more

Page 41: Dialog Design Speech/Natural Language Pen & Gesture

Hand Printing / Writing Recognition Recognizing letters and numbers and

special symbols Lots of systems (commercial too) English, kanji, etc. Not perfect, but people aren’t either!

People - 96% handprinted single characters Computer - >97% is really good

OCR (Optical Character Recognition)

Page 42: Dialog Design Speech/Natural Language Pen & Gesture

Recognition Issues

Off-line vs. On-lineOff-line: After all writing is done,

speed not an issue, only quality.• Work with either a bit map or vector

sequence

On-line: Must respond in real-time - but have richer set of features - acceleration, velocity, pressure

Page 43: Dialog Design Speech/Natural Language Pen & Gesture

More Issues

Boxed vs. Free-Form inputSometimes encounter boxes on forms

Printed vs. CursiveCursive is much more difficult to impossible

Letters vs. WordsCursive is easier to do in words vs

individual letters, as words create more context

Page 44: Dialog Design Speech/Natural Language Pen & Gesture

More Issues

Using context & words can helpUsually requires existence of a

dictionaryCheck to see if word existsConsider 1 vs. I vs. l

Training - Many systems improve a lot with training data

Page 45: Dialog Design Speech/Natural Language Pen & Gesture

Special Alphabets

Graffiti - Unistroke alphabet on Palm PDAWhat are your

experienceswith Graffiti?

Other alphabets or purposesGestures for commands

Page 46: Dialog Design Speech/Natural Language Pen & Gesture

Pen Gesture Commands

-Might mean delete

-Insert

-Paragraph

Define a series of (hopefully) simple drawing gesturesthat mean different commands in a system

Page 47: Dialog Design Speech/Natural Language Pen & Gesture

Pen Use Modes

Often, want a mix of free-form drawing and special commands

How does user switch modes?Mode icon on screenButton on penButton on device

Page 48: Dialog Design Speech/Natural Language Pen & Gesture

Error Correction

Having to correct errors can slow input tremendously

StrategiesErase and try again (repetition)When uncertain, system shows list of

best guesses (n-best list)Others??

Page 49: Dialog Design Speech/Natural Language Pen & Gesture

More ink applications

Signature verification Notetaking Electronic whiteboards and large-

scale displays Sketching

Page 50: Dialog Design Speech/Natural Language Pen & Gesture

Free-form Ink

Ink is the data, take as is

Human is responsible forunderstanding andinterpretation

Like a sketch pad Often time-stamped

Page 51: Dialog Design Speech/Natural Language Pen & Gesture

Audio Notebook

Stifelman, MIT

affordances of paper

notetaking activity

indexing

scanning

Page 52: Dialog Design Speech/Natural Language Pen & Gesture

Meeting Support

Natural Input

Indexing

Tivoli Domain

objects

Page 53: Dialog Design Speech/Natural Language Pen & Gesture

eClass

Ubiquitous Access

Page 54: Dialog Design Speech/Natural Language Pen & Gesture

Flatland

Supportindividualwhiteboarduse

Use study persistence segments Informal

Mynatt, E.D., Igarashi, T., Edward, W.K., and LaMarca, A. (1999). "Flatland: New dimensions in office whiteboards." In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 1999; Pittsburgh, Pennsylvania). New York: ACM Press, pp. 346-353.

Page 55: Dialog Design Speech/Natural Language Pen & Gesture

Example

DENIM – Landay, Berkeley

Page 56: Dialog Design Speech/Natural Language Pen & Gesture

General Issues in Choosing Dialogue Style Who is in control - user or computer Initial training required Learning time to become proficient Speed of use Generality/flexibility/power Special skills - typing Gulf of evaluation / gulf of execution Screen space required Computational resources required

Page 57: Dialog Design Speech/Natural Language Pen & Gesture

Gesture Recognition

Tracking 3D hand-arm gesturesfiber optic, e.g. dataglovemagnetic tracker, e.g. Polhemus

Perceptual user interfacesemerging areamainly computer vision researchers

Page 58: Dialog Design Speech/Natural Language Pen & Gesture

Other interesting interactions

3D interactionStereoscopic displays

Virtual realityImmersive displays such as glasses,

caves Augmented reality

Head trackers and vision based tracking