[2c3]developing context-aware applications

Developing context-aware applications

Marsal Gavaldà Expect Labs

@MarsalGavalda

[email protected]

Anticipatory computing is transforming the way we find information

Today, we find information Tomorrow, information finds us

Set reminder

- Launch calendar app - Create new reminder - Enter flight details

Check flight status

- Launch web browser - Go to airline site - Enter flight number

Check traffic

- Launch web browser - Go to map / traffic site - Enter current location - Enter airport address

Anticipatory computing relies on context awareness S

ourc

e: “

Ent

oura

ge”

by H

BO

Sou

rce:

Our

Mob

ile P

lane

t by

Goo

gle

Smartphone adoption (2013)

Mobile devices capture context via many sensors

Sou

rce:

Sam

sung

Sou

rce:

funf

.org

- Cameras - Microphones - Cellular receiver - Wi-Fi receiver - GPS receiver - Gyroscope - Thermometer - Barometer - …

Backend systems infer user situation, activity, intent, mood S

ourc

es: G

igaO

M, R

obin

Lab

s

Responsive design

Source: Mashable

Contextual design

Recent technology advances

Speech Recognition - Deep learning (deep / recurrent ANNs) - Ultra large language models - Dynamic speaker adaptation - Massive datasets (108s of users)

Computer Vision - Deep learning - Massive datasets

Language Understanding - Deep learning - Knowledge graphs

Sou

rce:

Fac

eboo

k S

ourc

e: S

tanf

ord

Uni

vers

ity

Knowledge graphs From disembodied strings to grounded entities

•  Yahoo! 10 M entities, 30 M properties, 10 M connections •  Microsoft 300 M entities, 800 M connections •  Google 570 M entities, 18 B properties and connections •  Wikipedia 4 M entities •  Freebase 40 M topics, 2 B facts •  Factual 66 M local businesses and POIs in 50 countries •  LinkedIn 225 M people •  Facebook 1.15 B people Cf. •  Cyc

239 K concepts, 2 M facts

•  OpenCyc 6 K concepts, 60 K facts

Sou

rce:

Yah

oo

Dynamic activation of the knowledge graph TIME

Continuous user context

hayes valley palo alto north beach cow hollow

I really want to see that new movie with Ben Affleck It is the one about the Iran Hostage Crisis

I am going to meet Raymond at Goat Hill Pizza at noon

You have to see that video of the Today Show doing the Harlem Shake It is near the Comstock Saloon

The Black Keys were on the Colbert Report last night I am planning to go whitewater rafting in the Grand Canyon

It is near the Comstock Saloon

Rolling Context Window

Dynamic entity graph (~10M entities)

things I recently wrote or said

restaurants near North Beach

places in the Bay area

topics related to things I recently read

current events my friends, colleagues

and recent contacts links that my friends have recently shared

Human Knowledge (~50B entities)

5B people

1B places 1B products

100M interests

100M events 1B media

2008 (1M entities)

2010 (10M entities)

2014 (500M entities)

2016 (10B entities)

5B domain-specific

The knowledge graph enables anchored NLP

“I saw the man on the hill with the telescope”

Sou

rce:

Den

iz Y

uret

Voice

10% of Baidu search queries are done with voice today. In five years, it’ll be 50% ”

Andrew Ng

Types of voice-driven applications

Question & Answer "What is the capital of California?" "Who directed Citizen Kane?"

Command & Control "Call Jenny's work phone." "Turn up the heat to 72 degrees."

Content Discovery "Is there a good Japanese restaurant near Union Square?" "Show me all the James Bond movies with Roger Moore."

Performing Tasks "Make a reservation for two at Kama tomorrow at 8pm." "Book me on a flight to JFK on Saturday afternoon."

Dictation "Send a text to Jenny saying…" "Send the following email to Joe…"

Passive Listening "…have you seen that video of the Russian meteor…" "…I’m thinking of getting a pair of red Kobe 9 sneakers…"

Anatomy of a voice interaction

1. Speech recognition 2. Natural language understanding

3. Search ranking & filtering 4. Real-time visualization of results

type: restaurant category: Italian

location: San Francisco cost: $, $$

filter: good for kids

”It’d be nice to find an inexpensive Italian restaurant in San Francisco that is good for kids.”

Candidate 1: Buca di Beppo [confidence: 0.91] Candidate 2: La Traviata [confidence: 0.82]

Candidate 3: Ragazza [confidence: 0.80] Candidate 4: Sotto Mare [confidence: 0.76]

…

The MindMeld platform

generate a continuously changing model of user intent based on long-running context 2�1�

passively analyze multiple concurrent data streams for each user in real-time voice, gps, video, updates, … 3�

proactively find, correlate and rank relevant information display to user as appropriate

The MindMeld API

CONFIDENTIAL

Step 1 !We will automatically index !any document collection.!

18

CONFIDENTIAL

Step 1 !We will automatically index !any document collection.!!!Step 2 !Use our API to continuously!track contextual signals for!your users.!!

20

curl -‐X POST \ -‐H "X-‐MindMeld-‐Access-‐Token: mindmeld-‐access-‐token" \ -‐H "Content-‐Type: application/json" \ -‐d '{ "text": "I was thinking we could go to Muir Woods or Stinson Beach", "type":"speech", "weight":0.5 }' \ "https://mindmeld.expectlabs.com/session/:sessionid/textentries"

21

CONFIDENTIAL

Step 1 !We will automatically index !any document collection.!!!Step 2 !Use our API to continuously!track contextual signals for!your users.!!!Step 3 !Display context-driven !search results and !recommendations.!!!

curl "https://mindmeld.expectlabs.com/session/:sessionid/documents"

Speech transcription

Syntactic analysis

Keyphrase & entity

extraction

Utterance classification

& conversation

modeling

Correlation & extrapolation

Doc ranking

Utterance propagation

Technology choice: Speech recognition

Google Nuance

AT&T Build own

•  Google’s server-side implementation of HTML5’s webkitSpeechRecognition for Chrome and Android

•  Fast, interim results, free •  79 languages •  No SLA, no iOS support

•  Nuance NDEV Mobile speech-to-text service •  Custom vocabularies •  Embedded engine, cloud-based API, and

combination •  40 languages •  SLA

•  AT&T Speech API •  iOS, Android SDK clients to cloud-based API •  19 languages •  SLA

•  Start with open-source speech recognition engine such as Sphinx or Kaldi

•  Be ready to invest $$$ •  Full control

Technology choice: Natural language understanding

Stanford NLP NLTK

AlchemyAPI TextRazor

•  PoS tagging, parsing, entity extraction, co-reference resolution

•  Java-based SDK •  Free (GNU General Public License v2)

•  PoS tagging, parsing, entity extraction •  Python-based SDK •  Free (Apache License v2)

•  Sentiment analysis, entity / keyword extraction, language detection

•  Cloud-based API

OpenCalais Build own

•  Extraction of named entities, facts, and events •  Cloud-based API

•  Combine, extend functionality •  Full control •  Requires maintenance

•  Entity recognition, topic tagging, dependency parsing

•  Cloud-based API

Technology choice: Machine learning

Scikit-learn Weka

Google GraphLab

•  Classification, regression, clustering •  SVM, logistic regression, random forests, … •  Python-based, open-source library

•  Data analysis, predictive modeling •  Wide array of machine learning classifiers •  Java-based, free library (GNU GPL)

•  Google Prediction API •  Pattern matching, classifiers, recommender

systems •  Freemium pricing

PredictionIO Build own

•  Predictive modeling, recommender systems •  Open-source service

•  Combine, extend functionality •  Optimize runtime for each model

•  Topic modeling, graph analytics, clustering, collaborative filtering, computer vision

•  Parallel programming •  C++ core, Python interface

Technology choice: Development & operations

GitHub Chef

Nginx Nagios

•  Code repos management

•  Operations monitoring & alerting

Circle CI Pivotal Tracker

•  Project management

•  Server provisioning

•  Web server

•  Test & build

Amazon Web Services Nitrous.IO

•  Cloud-based servers, storage, load balancers •  Dev box in seconds with browser-based IDE

Challenges

Functionality Accuracy

Scalability •  Users •  Applications •  Domains Latency •  ASR •  NLU •  IR •  Visualization

ASR •  Word error rate •  Accented speech •  Noisy environment •  Distant speaker NLU / IR •  Precision & recall •  Word sense disambiguation •  Anaphora resolution •  Conversation modeling •  Interruptability

MindMeld API: Powerful yet easy to use

developer.expectlabs.com real-time location

on any device speech recognition on any device

Android SDK iOS

SDK

JavaScript SDK

sample code

turnkey HTML5 widgets

push events

open graph support

entity extraction

customizable ranking

keyphrase detection

topic detection

natural language processing

on-demand web crawling

proactive suggestions

instant answers

extensive online documentation

real-time analytics console

complete API explorer tool

crawl manager dashboard

ranking dashboard

MindMeld API: Adaptive ranking factors

MindMeld API: Powering a wide range of applications

Voice-Driven Intelligent Assistant

Location-Based Proactive Assistant

Voice and Video Conference Assistant

1 2 3

online commerce media & entertainment mobile apps & devices

wearables

location-based services local & travel apps

smart cars mobile workforce

customer support & help desk call center solutions voice & video calling

telepresence & collaboration

Instant voice-driven search and discovery on your own content

deve

lope

r.ex

pect

labs

.com

감사합니다

Widescreen Test Pa.ern (16:9)

Aspect Ra8o Test

(Should appear circular)

16:9

4:3