dialog design speech/natural language pen & gesture

Post on 21-Dec-2015

225 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Dialog Design

Speech/Natural Language

Pen & Gesture

Dialog Styles

1. Command languages

2. WIMP - Window, Icon, Menu, Pointer

3. Direct manipulation

4. Speech/Natural language

5. Gesture, pen

Natural input

Universal design Take advantage of familiarity,

existing knowledge Alternative input & output Multi-modal interfaces Getting “off the desktop”

Agenda

Speech Natural Language Other audio

PDAs & Pen input styles Gesture

When to Use Speech

Hands busy Mobility required Eyes occupied Conditions preclude use of keyboard Visual impairment Physical limitation

Waveform & Spectrogram

Speech does not equal written language

Parsing Sentences

"I told him to go back where he came from, but he wouldn't listen."

Speech Input

Speaker recognition Speech recognition Natural language understanding

Speaker Recognition

Tell which person it is (voice print)

Could also be important for monitoring meetings, determining speaker

Speech Recognition

Primarily identifying words

Improving all the time Commercial systems:

IBM ViaVoice, Dragon Dictate, ...

Recognition Dimensions

Speaker dependent/independent Parametric patterns are sensitive to speaker With training (dependent) can get better

Vocabulary Some have 50,000+ words

Isolated word vs. continuous speech Continuous: where words stop & begin Typically a pattern match, no context used

Did youvs.Didja

Recognition Systems

Typical system has 5 components: Speech capture device - Has analog -> digital

converter Digital Signal Processor - Gets word

boundaries, scales, filters, cuts out extra stuff Preprocessed signal storage - Processed

speech buffered for recognition algorithm Reference speech patterns - Stored templates

or generative speech models for comparisons Pattern matching algorithm - Goodness of fit

from templates/model to user’s speech

Errors

Systems make four types of errors:Substitution - one for anotherRejection - detected, but not

recognizedInsertion - addedDeletion - not detected

Problems with recovery

Recognition Example:

My flight experience uh, I started off as a private pilot and then I spent 20 years flying as a navigator on C130s in the air force and I currently fly as a first officer on the SAAB 340 turboprop.

my flight experience a and I started off as a private pilot and I spent 20 years flying is a navigator on C-130s in the Air Force and I currently fly as a first officer on the sound 340 turboprop.

And another example

Here here's one of the

thoughts I'd throw in on

that. I'd agree with

Howard that the

combination of checklists

and flows is the way way

to manage things best.

hair and here's one of the

outside troodon and

an eye to agree with

power that the

domination checklists

and flows is the way

that managed things best

Natural Language Understanding

Putting meaning to the words

Input might be spoken or typed

Holy grail of Artificial Intelligence problems

Eliza - a “Faked” Understanding Person : Men are all alike.

ELIZA : In what way?Person : They're always bugging us about something or other.ELIZA : Can you think of a specific example?Person : Well, my boyfriend made me come here.ELIZA : Your boyfriend made you come here?Person : He says I'm depressed much of the time.ELIZA : I am sorry to hear that you are depressed.Person : It's true. I am unhappy.ELIZA : Do you think coming here will help you not to be unhappy?Person : I need some help, that much seems certain.ELIZA : What would it mean to you if you got some help?Person : Perhaps I could learn to get along with my mother.ELIZA : Tell me more about your family

http://www-ai.ijs.si/eliza/eliza.html Weizenbaum, J., "ELIZA -- A computer program for the study of

natural language communication between man and machine", Communications of the ACM 9(1):36-45, 1966

Natural Language Domains

Conceptual: total set of objects/actions• Knows about managers and salaries

Functional: what can be expressed• What is the salary of Joe’s manager? Who is Mary’s

manager? Syntactic: variety of forms

• What is the salary of the manager of Joe? Lexical: word meanings, synonyms,

vocabulary• What were the earnings of Joe’s boss

NL Factors/Terms

SyntacticGrammar or structure

ProsodicInflection, stress, pitch, timing

PragmaticSituated context of utterance, location,

time Semantic

Meaning of words

SR/NLU Advantages

Easy to learn and remember Powerful Fast, efficient (not always) Little screen real estate

SR/NLU Disadvantages

Assumes domain knowledge Doesn’t work well enough yet

Requires confirmationAnd recognition will always be error-

prone Expensive to implement Unrealistic expectations Generate mistrust/anger

Speech Output

Tradeoffs in speed, naturalness and understandability

Male or female voice? Technical issues (freq. response of phone) User preference (depends on the application)

Rate of speech Technically up to 550 wpm! Depends on listener

Synthesized or Pre-recorded? Synthesized: Better coverage, flexibility Recorded: Better quality, acceptance

Speech Output

Synthesis Quality depends on software ($$) Influence of vocabulary and phrase choices http://www.research.att.com/projects/tts/demo.html

Recorded segments Store tones, then put them together The transitions are difficult (e.g., numbers)

Designing the Interaction

Constrain vocabularyLimit valid commandsStructure questions wisely (Yes/No)Manage the interactionExamples?

Slow speech rate, but concise phrases Design for failsafe error recovery Visual record of input/output Design for the user – Wizard of Oz

Speech Tools/Toolkits

Java Speech SDK FreeTTS 1.1.1

http://freetts.sourceforge.net/docs/index.php

IBM JavaBeans for speech Microsoft speech SDK (Visual Basic, etc.) OS capabilities (speech recognition and

synthesis built in to OS) (TextEdit) VoiceXML

General Issues in Choosing Dialogue Style Who is in control - user or computer Initial training required Learning time to become proficient Speed of use Generality/flexibility/power Special skills - typing Gulf of evaluation / gulf of execution Screen space required Computational resources required

Non-speech audio

Traditionally used for warnings, alarms or status information

Sounds provide information that help reduce error. Eg: typing, video games

Multi-modal interfaces

Additional benefits of non-speech audio

Good for indicating changes, since we ignore continuous sounds

Provides secondary representationSupports visual interface

Tradeoff in using natural (real) sounds vs. synthesized noises.

Non-speech audio examples

Error ding Info beep Email arriving ding Recycle Battery critical Logoff LogonOthers?

Pen and Gesture

PDAs

Becoming more common and widely used

Smaller display (160x160), (320x240) Few buttons, interact through pen Estimate: 14 million shipped by 2004 Improvements

Wireless, color, more memory, better CPU, better OS

Palmtop versus Handheld

No Shredder…

http://sonyelectronics.sonystyle.com/micros/clie/

http://www.vaio.sony.co.jp/Products/VGN-U50/

http://www.oqo.com/

http://www.blackberry.com/

Input

Pen is dominant form for PDA and Tablet Main techniques

Free-form ink Soft keyboard Numeric keyboard => text Stroke recognition - strokes not in the shape

of characters Hand printing / writing recognition

Sometimes can connect keyboard or has modified keyboard (e.g. cell phones)

Soft Keyboards

Common on PDAs and mobile devicesTap on buttons on screen

Soft Keyboard

Presents a small diagram of keyboard You click on buttons/keys with pen QWERTY vs. alphabetical

Tradeoffs?Alternatives?

Numeric Keypad -T9

Tegic Communications developed You press out letters of your word, it matches the

most likely word, then gives optional choices Faster than multiple presses per key Used in mobile phones http://www.t9.com/

Cirrin - Stroke Recognition

Developed by Jen Mankoff (GT -> Berkeley CS Faculty -> CMU CS Faculty)

Word-level unistroke technique UIST ‘98 paper Use stylus to go

from one letterto the next ->

Quikwriting - Stroke Recogntion

Developed by Ken Perlin

Quikwriting Example

p l

http://mrl.nyu.edu/~perlin/demos/Quikwrite2_0.html

e

Said to be as fast as graffiti, but have to learn more

Hand Printing / Writing Recognition Recognizing letters and numbers and

special symbols Lots of systems (commercial too) English, kanji, etc. Not perfect, but people aren’t either!

People - 96% handprinted single characters Computer - >97% is really good

OCR (Optical Character Recognition)

Recognition Issues

Off-line vs. On-lineOff-line: After all writing is done,

speed not an issue, only quality.• Work with either a bit map or vector

sequence

On-line: Must respond in real-time - but have richer set of features - acceleration, velocity, pressure

More Issues

Boxed vs. Free-Form inputSometimes encounter boxes on forms

Printed vs. CursiveCursive is much more difficult to impossible

Letters vs. WordsCursive is easier to do in words vs

individual letters, as words create more context

More Issues

Using context & words can helpUsually requires existence of a

dictionaryCheck to see if word existsConsider 1 vs. I vs. l

Training - Many systems improve a lot with training data

Special Alphabets

Graffiti - Unistroke alphabet on Palm PDAWhat are your

experienceswith Graffiti?

Other alphabets or purposesGestures for commands

Pen Gesture Commands

-Might mean delete

-Insert

-Paragraph

Define a series of (hopefully) simple drawing gesturesthat mean different commands in a system

Pen Use Modes

Often, want a mix of free-form drawing and special commands

How does user switch modes?Mode icon on screenButton on penButton on device

Error Correction

Having to correct errors can slow input tremendously

StrategiesErase and try again (repetition)When uncertain, system shows list of

best guesses (n-best list)Others??

More ink applications

Signature verification Notetaking Electronic whiteboards and large-

scale displays Sketching

Free-form Ink

Ink is the data, take as is

Human is responsible forunderstanding andinterpretation

Like a sketch pad Often time-stamped

Audio Notebook

Stifelman, MIT

affordances of paper

notetaking activity

indexing

scanning

Meeting Support

Natural Input

Indexing

Tivoli Domain

objects

eClass

Ubiquitous Access

Flatland

Supportindividualwhiteboarduse

Use study persistence segments Informal

Mynatt, E.D., Igarashi, T., Edward, W.K., and LaMarca, A. (1999). "Flatland: New dimensions in office whiteboards." In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 1999; Pittsburgh, Pennsylvania). New York: ACM Press, pp. 346-353.

Example

DENIM – Landay, Berkeley

General Issues in Choosing Dialogue Style Who is in control - user or computer Initial training required Learning time to become proficient Speed of use Generality/flexibility/power Special skills - typing Gulf of evaluation / gulf of execution Screen space required Computational resources required

Gesture Recognition

Tracking 3D hand-arm gesturesfiber optic, e.g. dataglovemagnetic tracker, e.g. Polhemus

Perceptual user interfacesemerging areamainly computer vision researchers

Other interesting interactions

3D interactionStereoscopic displays

Virtual realityImmersive displays such as glasses,

caves Augmented reality

Head trackers and vision based tracking

top related