Copyright © 2017. All rights reserved.
Pardon My French
And Other Adventures on the Road to Enterprise Virtual Assistants
Editt Gonen-FriedmanOracle Voice & Emerging [email protected]
Copyright © 2017. All rights reserved.
Voice Interaction
“Voice-based technologies are the most important area of growth for mobile user interfaces… hands-free use and always-on interfaces will drive increased use of speech recognition… enterprise application developers will need to accommodate new ways of accepting input”. - Intelligence report, May 2015, Tractica
“Enterprises are going to be affected by a worker’s need to do more than type, click and swipe” - ITWC
“2016 will be the year of Conversational Commerce” – Chris Messina on Medium
2
Copyright © 2017. All rights reserved.
What Does it Take to Build an Enterprise Virtual Assistant?
(ASR) Automatic
Speech Recognition
Voice UI
Dialog Management
(NLU) Natural
Language Understanding
3
• Multiple technologies must come together to build it.
Copyright © 2017. All rights reserved. 4
• Needs SR
• Build your own- advantages:
– Build a massive language corpus (Google)
– Handle surround sound, priority by proximity (Amazon’s Alexa)
– Use voice biometrics to identify speaker (Alexa, Nuance)
• Or use 3rd party services
A Mobile Enterprise VA
Image source: itpro.co.uk
Copyright © 2017. All rights reserved. 5
• Speech service considerations:
– Footprint: local install (Sensory) or cloud service
– Security: enterprise data is sensitive
– WER (word error rate)
– Device support
– Languages (global enterprise)
– Vocab customization: ability to add recurring entity names and industry jargon
Automatic Speech
Recognition
Mobile Enterprise VA
Copyright © 2017. All rights reserved. 6
• Compared to a general purpose VA
– Supported actions are limited
– Context is limited
• Is it easier?
• As a rule there’s less ambiguity, but sometimes need to resolve to less popular meaning
• Example: – The user says: “Leads” or “Go to leads”
– Intent: navigate to the leads page in my speech-enabled mobile app for sales
Speech Considerations: Vocabulary
Copyright © 2017. All rights reserved. 7
– Result: “Go to Leeds”. A general purpose VA might understand this to mean “bring up the map for Leeds, England”
Speech Considerations: Vocabulary
Leeds, Northern England
– In this case we had to add “Leads” to the ASR custom vocabulary with an increased ‘weight’ of 50% instead of 10%• Could also be solved at the NLP step, with full NLP that resolves ambiguity
Copyright © 2017. All rights reserved.
A Mobile Enterprise VA
8
Automatic Speech
Recognition
Voice UI
• Needs voice interaction design
– How to make it look like a speech app?
– How to deal with command discoverability?
– Can you ‘wake it’ with a key word?
– In what case do you allow touch and voice combo?
– How to indicate ‘listening’?
Copyright © 2017. All rights reserved.
A Mobile Enterprise VA
9
– Here are some attempts to answer those questions in a dedicated speech app, Oracle Voice
Copyright © 2017. All rights reserved.
A Mobile Enterprise VA
10
– And this is a UI change to a regular app, Oracle Sales Cloud Mobile, where speech capabilities have been added
Copyright © 2017. All rights reserved.
A Mobile Enterprise VA
11
• Needs dialog management
– One-step response: gives you a simple answer or link, or navigates you to another page
– Multi-step dialog: manages a back-and-forth dialog in which context is retained
– Perhaps add more useful interactions• Such as business content reading
(news, emails, app listings)
Dialog Management
Automatic Speech
Recognition
VUI
Dialog Management
Copyright © 2017. All rights reserved.
A Mobile Enterprise VA
12
• Needs NLU
• Many 3rd party solutions available
– It’s possible to start with a basic solution thatunderstands a number of meanings and intents and can follow up with specific actions and taskflows
– Soon you’ll run into the need for full language and context intelligence
Dialog Management
Automatic Speech
Recognition
VUI
Dialog Management
Natural Language
Understanding
Copyright © 2017. All rights reserved.
A Mobile Enterprise VA
13
• Robust NLU needs to resolve ambiguity in context
– Leads vs. Leeds is a simple example
– ‘Diversification’ means ‘investment variety’ in Finance, but ‘getting rid of assets’ in Marketing
Image source: Oracle Intelligent UX
Copyright © 2017. All rights reserved.
A Mobile Enterprise VA
14
• Adding an NLU solution to the mobile app is no simple task
– Test performance
– Word error rate
– Intent error rate
Image source: Right Now Intent Guide
Copyright © 2017. All rights reserved.
Are We Done Yet?
15
• Users want language support
Dialog Management
Automatic Speech
Recognition
VUI
Dialog Management
Natural Language
Understanding
Languages
Copyright © 2017. All rights reserved.
Languages
16
Source: Technology Review
– Your speech service recognizes in 40 languages – why doesn’t your app?
The user asked how I’m doing
Respond that I’m doing well
How are you doing? Speech
Engine
How are you doing?
Copyright © 2017. All rights reserved.
Languages
17
Source: Technology Review
I have no idea what that means
Error handling
Comment allez-vous? Speech
Engine
Comment allez-vous?
– A user speaks French. SR output is French text.
Copyright © 2017. All rights reserved.
Languages
18
• A middle step is missing, a translation, or a mapping
• You could translate the text to English before further processing, or-
• You could add NLP in other languages
– When adding NLP in other languages you also essentially add a mapping between key words in English that associate intent with actions, and the corresponding words in the other supported languages.
Source: Technology Review
Copyright © 2017. All rights reserved.
Languages
19
Source: Technology Review
• Translation services work differently, using statistics on many translated examples
• In a late 2016 blog post Googlers’ implied that Google’s AI translation tool seems to have invented its own secret internal language, an internal representation, a machine initiated mapping
– The tool was trained to translate between English and Korean, and between English and Japanese
– The team found that the tool has spontaneously acquired the ability to translate between Korean and Japanese
– Science fiction? Read here:https://techcrunch.com/2016/11/22/googles-ai-translation-tool-seems-to-have-invented-its-own-secret-internal-language/
Copyright © 2017. All rights reserved.
Languages
20
Source: Technology Review
A visualization of the translation system’s memory when translating a single sentence in multiple directions.
Copyright © 2017. All rights reserved.
Are We Done Yet?
21
• Users want AI
Source: Technology Review
Copyright © 2017. All rights reserved.
Analytics=AI
Automatic Speech
Recognition
VUI Design
Dialog Management
Natural Language
Understanding
22
Languages Analytics
• Users want AI
• What they are really asking for is analytics
• Simple analytics gives you hindsight about what happened
Copyright © 2017. All rights reserved.
What Does It Take to Build an Enterprise Virtual Assistant?
Automatic Speech
Recognition
VUI
Dialog Management
Natural Language
Understanding
Languages
Descriptive Analytics
Predictive Analytics
Internet of Things
Prescriptive Analytics
Machine Learning
23
• Descriptive analytics allows answering more complex questions and gives you insight about what’s happening
• Predictive analytics gives you foresight about what will happen. It should also pull data from the real world
• Prescriptive analytics tells you what to do to get specific outcomes
• Machine learning makes sure the system gets better and smarter with every interaction
Copyright © 2017. All rights reserved.
That’s What It Takes to Build an Enterprise Virtual Assistant
Automatic Speech
Recognition
VUI
Dialog Management
Natural Language
Understanding
Languages
Descriptive Analytics
Predictive Analytics
Internet of Things
Prescriptive Analytics
Machine Learning
24
When will you be done?
Source: http://theegeek.com/artificial-intelligence/
Copyright © 2017. All rights reserved.
Editt Gonen-Friedman
[email protected] [email protected]
https://www.linkedin.com/in/editt
25