[2c3]developing context-aware applications
DESCRIPTION
DEVIEW 2014 [2C3]Developing context-aware applicationsTRANSCRIPT
Anticipatory computing is transforming the way we find information
Today, we find information Tomorrow, information finds us
Set reminder
- Launch calendar app - Create new reminder - Enter flight details
Check flight status
- Launch web browser - Go to airline site - Enter flight number
Check traffic
- Launch web browser - Go to map / traffic site - Enter current location - Enter airport address
Anticipatory computing relies on context awareness S
ourc
e: “
Ent
oura
ge”
by H
BO
Sou
rce:
Our
Mob
ile P
lane
t by
Goo
gle
Smartphone adoption (2013)
Mobile devices capture context via many sensors
Sou
rce:
Sam
sung
Sou
rce:
funf
.org
- Cameras - Microphones - Cellular receiver - Wi-Fi receiver - GPS receiver - Gyroscope - Thermometer - Barometer - …
Backend systems infer user situation, activity, intent, mood S
ourc
es: G
igaO
M, R
obin
Lab
s
Responsive design
Source: Mashable
Contextual design
Recent technology advances
Speech Recognition - Deep learning (deep / recurrent ANNs) - Ultra large language models - Dynamic speaker adaptation - Massive datasets (108s of users)
Computer Vision - Deep learning - Massive datasets
Language Understanding - Deep learning - Knowledge graphs
Sou
rce:
Fac
eboo
k S
ourc
e: S
tanf
ord
Uni
vers
ity
Knowledge graphs From disembodied strings to grounded entities
• Yahoo! 10 M entities, 30 M properties, 10 M connections • Microsoft 300 M entities, 800 M connections • Google 570 M entities, 18 B properties and connections • Wikipedia 4 M entities • Freebase 40 M topics, 2 B facts • Factual 66 M local businesses and POIs in 50 countries • LinkedIn 225 M people • Facebook 1.15 B people Cf. • Cyc
239 K concepts, 2 M facts
• OpenCyc 6 K concepts, 60 K facts
Sou
rce:
Yah
oo
Dynamic activation of the knowledge graph TIME
Continuous user context
hayes valley palo alto north beach cow hollow
I really want to see that new movie with Ben Affleck It is the one about the Iran Hostage Crisis
I am going to meet Raymond at Goat Hill Pizza at noon
You have to see that video of the Today Show doing the Harlem Shake It is near the Comstock Saloon
The Black Keys were on the Colbert Report last night I am planning to go whitewater rafting in the Grand Canyon
It is near the Comstock Saloon
Rolling Context Window
Dynamic entity graph (~10M entities)
things I recently wrote or said
restaurants near North Beach
places in the Bay area
topics related to things I recently read
current events my friends, colleagues
and recent contacts links that my friends have recently shared
Human Knowledge (~50B entities)
5B people
1B places 1B products
100M interests
100M events 1B media
2008 (1M entities)
2010 (10M entities)
2014 (500M entities)
2016 (10B entities)
5B domain-specific
The knowledge graph enables anchored NLP
“I saw the man on the hill with the telescope”
Sou
rce:
Den
iz Y
uret
Voice
10% of Baidu search queries are done with voice today. In five years, it’ll be 50% ”
Andrew Ng
Types of voice-driven applications
Question & Answer "What is the capital of California?" "Who directed Citizen Kane?"
Command & Control "Call Jenny's work phone." "Turn up the heat to 72 degrees."
Content Discovery "Is there a good Japanese restaurant near Union Square?" "Show me all the James Bond movies with Roger Moore."
Performing Tasks "Make a reservation for two at Kama tomorrow at 8pm." "Book me on a flight to JFK on Saturday afternoon."
Dictation "Send a text to Jenny saying…" "Send the following email to Joe…"
Passive Listening "…have you seen that video of the Russian meteor…" "…I’m thinking of getting a pair of red Kobe 9 sneakers…"
Anatomy of a voice interaction
1. Speech recognition 2. Natural language understanding
3. Search ranking & filtering 4. Real-time visualization of results
type: restaurant category: Italian
location: San Francisco cost: $, $$
filter: good for kids
”It’d be nice to find an inexpensive Italian restaurant in San Francisco that is good for kids.”
Candidate 1: Buca di Beppo [confidence: 0.91] Candidate 2: La Traviata [confidence: 0.82]
Candidate 3: Ragazza [confidence: 0.80] Candidate 4: Sotto Mare [confidence: 0.76]
…
The MindMeld platform
generate a continuously changing model of user intent based on long-running context 2�1�
passively analyze multiple concurrent data streams for each user in real-time voice, gps, video, updates, … 3�
proactively find, correlate and rank relevant information display to user as appropriate
The MindMeld API
CONFIDENTIAL
Step 1 !We will automatically index !any document collection.!
18
CONFIDENTIAL
Step 1 !We will automatically index !any document collection.!!!Step 2 !Use our API to continuously!track contextual signals for!your users.!!
20
curl -‐X POST \ -‐H "X-‐MindMeld-‐Access-‐Token: mindmeld-‐access-‐token" \ -‐H "Content-‐Type: application/json" \ -‐d '{ "text": "I was thinking we could go to Muir Woods or Stinson Beach", "type":"speech", "weight":0.5 }' \ "https://mindmeld.expectlabs.com/session/:sessionid/textentries"
21
CONFIDENTIAL
Step 1 !We will automatically index !any document collection.!!!Step 2 !Use our API to continuously!track contextual signals for!your users.!!!Step 3 !Display context-driven !search results and !recommendations.!!!
curl "https://mindmeld.expectlabs.com/session/:sessionid/documents"
Speech transcription
Syntactic analysis
Keyphrase & entity
extraction
Utterance classification
& conversation
modeling
Correlation & extrapolation
Doc ranking
Utterance propagation
Technology choice: Speech recognition
Google Nuance
AT&T Build own
• Google’s server-side implementation of HTML5’s webkitSpeechRecognition for Chrome and Android
• Fast, interim results, free • 79 languages • No SLA, no iOS support
• Nuance NDEV Mobile speech-to-text service • Custom vocabularies • Embedded engine, cloud-based API, and
combination • 40 languages • SLA
• AT&T Speech API • iOS, Android SDK clients to cloud-based API • 19 languages • SLA
• Start with open-source speech recognition engine such as Sphinx or Kaldi
• Be ready to invest $$$ • Full control
Technology choice: Natural language understanding
Stanford NLP NLTK
AlchemyAPI TextRazor
• PoS tagging, parsing, entity extraction, co-reference resolution
• Java-based SDK • Free (GNU General Public License v2)
• PoS tagging, parsing, entity extraction • Python-based SDK • Free (Apache License v2)
• Sentiment analysis, entity / keyword extraction, language detection
• Cloud-based API
OpenCalais Build own
• Extraction of named entities, facts, and events • Cloud-based API
• Combine, extend functionality • Full control • Requires maintenance
• Entity recognition, topic tagging, dependency parsing
• Cloud-based API
Technology choice: Machine learning
Scikit-learn Weka
Google GraphLab
• Classification, regression, clustering • SVM, logistic regression, random forests, … • Python-based, open-source library
• Data analysis, predictive modeling • Wide array of machine learning classifiers • Java-based, free library (GNU GPL)
• Google Prediction API • Pattern matching, classifiers, recommender
systems • Freemium pricing
PredictionIO Build own
• Predictive modeling, recommender systems • Open-source service
• Combine, extend functionality • Optimize runtime for each model
• Topic modeling, graph analytics, clustering, collaborative filtering, computer vision
• Parallel programming • C++ core, Python interface
Technology choice: Development & operations
GitHub Chef
Nginx Nagios
• Code repos management
• Operations monitoring & alerting
Circle CI Pivotal Tracker
• Project management
• Server provisioning
• Web server
• Test & build
Amazon Web Services Nitrous.IO
• Cloud-based servers, storage, load balancers • Dev box in seconds with browser-based IDE
Challenges
Functionality Accuracy
Scalability • Users • Applications • Domains Latency • ASR • NLU • IR • Visualization
ASR • Word error rate • Accented speech • Noisy environment • Distant speaker NLU / IR • Precision & recall • Word sense disambiguation • Anaphora resolution • Conversation modeling • Interruptability
MindMeld API: Powerful yet easy to use
developer.expectlabs.com real-time location
on any device speech recognition on any device
Android SDK iOS
SDK
JavaScript SDK
sample code
turnkey HTML5 widgets
push events
open graph support
entity extraction
customizable ranking
keyphrase detection
topic detection
natural language processing
on-demand web crawling
proactive suggestions
instant answers
extensive online documentation
real-time analytics console
complete API explorer tool
crawl manager dashboard
ranking dashboard
MindMeld API: Adaptive ranking factors
MindMeld API: Powering a wide range of applications
Voice-Driven Intelligent Assistant
Location-Based Proactive Assistant
Voice and Video Conference Assistant
1 2 3
online commerce media & entertainment mobile apps & devices
wearables
location-based services local & travel apps
smart cars mobile workforce
customer support & help desk call center solutions voice & video calling
telepresence & collaboration
Instant voice-driven search and discovery on your own content
deve
lope
r.ex
pect
labs
.com
감사합니다
Widescreen Test Pa.ern (16:9)
Aspect Ra8o Test
(Should appear circular)
16:9
4:3