language empowering intelligent assistants (cht)

55
Language Empowering Intelligent Assistants 智智智智智智智 YUN-NUNG (VIVIAN) CHEN 陳陳陳 HTTP://VIVIANCHEN.IDV.TW Jan. 5th, 2017 @ 智智智

Upload: yun-nung-vivian-chen

Post on 11-Apr-2017

120 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: Language Empowering Intelligent Assistants (CHT)

Language Empowering Intelligent Assistants智慧型對話助理Y U N - N U N G ( V I V I A N ) C H E N 陳 縕 儂

H T T P : / / V I V I A N C H E N . I D V . T W

Jan. 5th, 2017 @ 中華電信

Page 2: Language Empowering Intelligent Assistants (CHT)

2

OutlineIntroduction

Intelligent Assistants 智慧助理Mobile Service 行動客服Dialogue System/Bot 對話系統 /機器人

FrameworkLanguage Understanding 語意理解Dialogue Management 對話管理Output Generation 輸出生成

Recent TrendIndustrial Trend and Challenge

Deep Learning Basics

Deep Learning for Dialogues

Conclusion

Page 3: Language Empowering Intelligent Assistants (CHT)

3

OutlineIntroduction

Intelligent Assistants 智慧助理Mobile Service 行動客服Dialogue System/Bot 對話系統 /機器人

FrameworkLanguage Understanding 語意理解Dialogue Management 對話管理Output Generation 輸出生成

Recent TrendIndustrial Trend and Challenge

Deep Learning Basics

Deep Learning for Dialogues

Conclusion

Page 4: Language Empowering Intelligent Assistants (CHT)

4

Apple Siri (2011)

Google Now (2012)

Facebook M & Bot (2015)

Intelligent Assistants 智慧助理

Google Home (2016)

Microsoft Cortana (2014)

Amazon Alexa/Echo (2014)

Page 5: Language Empowering Intelligent Assistants (CHT)

5

Why do we need them?– Get things done

• E.g. set up alarm/reminder, take note

– Easy access to structured data, services and apps• E.g. find docs/photos/notes

– Assist your routine schedule• E.g. check the account balance

– Be more productive in managing your work and personal life

Page 6: Language Empowering Intelligent Assistants (CHT)

6

Mobile Service 行動客服• allows customers to conduct a range of financial transactions remotely

using a mobile devices, usually called an app (e.g. Richart)

reducing the need for visiting a branch cost reduction

Page 7: Language Empowering Intelligent Assistants (CHT)

7

Mobile Service 行動客服• The users can finish specific tasks that are predefined by the app

• Limitation– App usage design may not be user-friendly

– Good designs may differ across people

– Learning how to use app takes time

Page 8: Language Empowering Intelligent Assistants (CHT)

8

Why Natural Language?• Global Digital Statistics (2015 January)

Global Population

7.21B

Active Internet Users

3.01B

Active Social Media Accounts

2.08B

Active Unique Mobile Users

3.65B

The more natural and convenient input of devices evolves towards speech.

Page 9: Language Empowering Intelligent Assistants (CHT)

9

Intelligent Assistant Architecture

Reactive Assistance反應式協助

Proactive Assistance主動式協助

Data Data Bases and Client Signals

Device/Service End-points(Phone, PC, Xbox, Web Browser, Messaging Apps)

User Experience“restaurant suggestions”“call taxi”

Page 10: Language Empowering Intelligent Assistants (CHT)

10

• Dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions.

• Dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc).

Good dialogue systems assist users to access information conveniently and finish tasks efficiently.

Dialogue System 對話系統

JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion

Page 11: Language Empowering Intelligent Assistants (CHT)

11

App Bot• A bot is responsible for a “single” domain, similar to an app

Seamless and automatic information transferring across domains reduce duplicate information and interaction

Page 12: Language Empowering Intelligent Assistants (CHT)

12

OutlineIntroduction

Intelligent Assistants 智慧助理Mobile Service 行動客服Dialogue System/Bot 對話系統 /機器人

FrameworkLanguage Understanding 語意理解Dialogue Management 對話管理Output Generation 輸出生成

Recent TrendIndustrial Trend and Challenge

Deep Learning Basics

Deep Learning for Dialogues

Conclusion

Page 13: Language Empowering Intelligent Assistants (CHT)

13

System Framework

Speech Recognition

Language Understanding (LU)• Domain Identification• User Intent Detection• Slot Filling

Dialogue Management (DM)• Dialogue State Tracking• System Action/Policy

Decision

Output Generation

Recognized Text我要申辦下下周期間的國外上網方案

Semantic Frameapply_international_dataplanperiod=下下周

System Action/Policyrequest_country

Text response你要去哪一國

Screen Displaycountry?

Text Input我要申辦下下周期間的國外上網方案

Speech Signal

Page 14: Language Empowering Intelligent Assistants (CHT)

14

Interaction Example

User

Intelligent Agent Q: How does a dialogue system process this request?

你上個月的電話費帳單金額為 800元,請問你要用你預設的帳號繳款嗎?

我要繳交上個月的手機費帳單

Page 15: Language Empowering Intelligent Assistants (CHT)

15

System Framework

Speech Recognition

Language Understanding (LU)• Domain Identification• User Intent Detection• Slot Filling

Dialogue Management (DM)• Dialogue State Tracking• System Action/Policy

Decision

Output Generation

Recognized Text我要申辦下下周期間的國外上網方案

Semantic Frameapply_international_dataplanperiod=下下周

System Action/Policyrequest_country

Text response你要去哪一國

Screen Displaycountry?

Text Input我要申辦下下周期間的國外上網方案

Speech Signal

Page 16: Language Empowering Intelligent Assistants (CHT)

16

1. Domain IdentificationRequires Predefined Domain Ontology

User

Organized Domain Knowledge (Database)Intelligent Agent

市話 DB個人資料

DB

Machine Learning for Classification

手機 DB

我要繳交上個月的手機費帳單

Page 17: Language Empowering Intelligent Assistants (CHT)

17

我要繳交上個月的手機費帳單

2. Intent DetectionRequires Predefined Schema

User

Intelligent Agent

FEE_PAYMENTCHECK_REMAINING_DATA:

Machine Learning for Classification

手機 DB

Page 18: Language Empowering Intelligent Assistants (CHT)

18

我要繳交上個月的手機費帳單

3. Slot FillingRequires Predefined Schema

User

Intelligent Agent

手機 DB

Number Period Amount0933xxx 12月 8000928xxx 11月 560

: : :

FEE_PAYMENTperiod=“上個月” FEE_PAYMENT

period=“12月”amount=“800”

Semantic Frame

Machine Learning for Information Extraction

Page 19: Language Empowering Intelligent Assistants (CHT)

19

System Framework

Speech Recognition

Language Understanding (LU)• Domain Identification• User Intent Detection• Slot Filling

Dialogue Management (DM)• Dialogue State Tracking• System Action/Policy

Decision

Output Generation

Recognized Text我要申辦下下周期間的國外上網方案

Semantic Frameapply_international_dataplanperiod=下下周

System Action/Policyrequest_country

Text response你要去哪一國

Screen Displaycountry?

Text Input我要申辦下下周期間的國外上網方案

Speech Signal

Page 20: Language Empowering Intelligent Assistants (CHT)

20

State TrackingRequires Hand-Crafted States

User

Intelligent Agent

amount period number

amount, period

period, number

amount, card

all

要 0933那個號碼NULL

我要繳交上個月的手機費帳單

Page 21: Language Empowering Intelligent Assistants (CHT)

21

State TrackingRequires Hand-Crafted States

User

Intelligent Agent

period

period, number

要 0933那個號碼NULL

我要繳交上個月的手機費帳單

Page 22: Language Empowering Intelligent Assistants (CHT)

22

State TrackingRequires Hand-Crafted States

User

Intelligent Agent

amount period number

amount, period

period, number

amount, number

all

NULL

我要繳交 x個月的手機費帳單FEE_PAYMENTperiod=“這個月” FEE_PAYMENT

period=“上個月” FEE_PAYMENT

?

?

Page 23: Language Empowering Intelligent Assistants (CHT)

23

Policy for Agent Action• Inform

– “你的帳單金額為 800元”• Request

– “請問是要繳交哪一支號碼的呢 ?”

• Confirm– “你要繳交 12月的帳單嗎 ?”

• Database Search

• Task Completion / Information Display– Payment / Data Checking

0933xxx0928xxx

:

Page 24: Language Empowering Intelligent Assistants (CHT)

24

System Framework

Speech Recognition

Language Understanding (LU)• Domain Identification• User Intent Detection• Slot Filling

Dialogue Management (DM)• Dialogue State Tracking• System Action/Policy

Decision

Output Generation

Recognized Text我要申辦下下周期間的國外上網方案

Semantic Frameapply_international_dataplanperiod=下下周

System Action/Policyrequest_country

Text response你要去哪一國

Screen Displaycountry?

Text Input我要申辦下下周期間的國外上網方案

Speech Signal

Page 25: Language Empowering Intelligent Assistants (CHT)

25

Output / NL Generation• Inform

– “你的帳單為 800” v.s.

• Request– “你要繳交哪一支號碼的帳單 ?” v.s.

• Confirm– “你要繳交 12月的帳單嗎 ?”

$800

Page 26: Language Empowering Intelligent Assistants (CHT)

26

OutlineIntroduction

Intelligent Assistants 智慧助理Mobile Service 行動客服Dialogue System/Bot 對話系統 /機器人

FrameworkLanguage Understanding 語意理解Dialogue Management 對話管理Output Generation 輸出生成

Recent TrendIndustrial Trend and Challenge

Deep Learning Basics

Deep Learning for Dialogues

Conclusion

Page 27: Language Empowering Intelligent Assistants (CHT)

27

AI Startups

Page 28: Language Empowering Intelligent Assistants (CHT)

28

ChatBot Startups

Page 30: Language Empowering Intelligent Assistants (CHT)

Challenge• Predefined semantic schemaChen et al., “Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding,” in ACL-IJCNLP, 2015.

• Data without annotationsChen et al., “Zero-Shot Learning of Intent Embeddings for Expansion by Convolutional Deep Structured Semantic Models,” in ICASSP, 2016.

• Semantic concept interpretationChen et al., “Deriving Local Relational Surface Forms from Dependency-Based Entity Embeddings for Unsupervised Spoken Language Understanding,” in SLT, 2014.

• Predefined dialogue statesChen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.

• Error propagationHakkani-Tur et al., “Multi-Domain Joint Semantic Frame Parsing using Bi-directional RNN-LSTM,” in Interspeech, 2016.

• Cross-domain intention/bot hierarchySun et al., “An Intelligent Assistant for High-Level Task Understanding,” in IUI, 2016.Sun et al., “AppDialogue: Multi-App Dialogues for Intelligent Assistants,” in LREC, 2016.Chen et al., “Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding,” in ICMI, 2016.

• Cross-domain information transferringKim et al., “New Transfer Learning Techniques For Disparate Label Sets,” in ACL-IJCNLP, 2015.

FIND_RESTAURANTrating=“good” rating=5? 4?

HotelRest Flight

Travel

Trip Planning

30

Page 31: Language Empowering Intelligent Assistants (CHT)

31

Deep Learning Bas ics

Page 32: Language Empowering Intelligent Assistants (CHT)

32

Learning ≈Looking for a Function• Speech Recognition

• Handwritten Recognition

• Weather forecast

• Play video games

f

f

f

f

“2”

“你好”“ Saturday”

“move left”

Thursday

Page 33: Language Empowering Intelligent Assistants (CHT)

33

Machine Learning Framework

Training is to pick the best function given the observed dataTesting is to predict the label using the learned function

Training Data

Model: Hypothesis Function Set 21, ff

Training: Pick the best function f *

Testing: yxf y

*f“Best” Function

,ˆ,,ˆ, 2211 yxyx

Testing Data ,?,x

“It claims too much.”

- (negative)

:x

:yfunction input

function output

Page 34: Language Empowering Intelligent Assistants (CHT)

34

Target Function

• Classification Task

– x: input object to be classified a N-dim vector

– y: class/label a M-dim vector

yxf MN RRf :

Assume both x and y can be represented as fixed-size vectors

Page 35: Language Empowering Intelligent Assistants (CHT)

35

Vector Representation Ex

• Handwriting Digit Classification

“2”“1”

001

10 dimensions for digit recognition

“1”“2”“3”

010 “1

”“2”“3”

1: for ink 0: otherwise

Each pixel corresponds to an element in the vector

10

16 x 16

16 x 16 = 256 dimensions

x: image y: class/label

“1” or not

“2” or not“3” or not

MN RRf :

Page 36: Language Empowering Intelligent Assistants (CHT)

36

Vector Representation Ex

• Sentiment Analysis

“-”“+”

001

3 dimensions(positive, negative, neutral)

“+”“-”“?”

010 “+

”“-”“?”

1: indicates the word0: otherwise

Each element in the vector corresponds to a word in the vocabulary

10

dimensions = size of vocab

x: word y: class/label

“+” or not

“-” or not“?” or not

MN RRf :

“love”

Page 37: Language Empowering Intelligent Assistants (CHT)

37

A Single Neuron

z

1w

2w

Nw…

1x

2x

Nx

b

z z

zbias

y

zez

1

1

Sigmoid function

Activation function

Each neuron is a very simple function

Page 38: Language Empowering Intelligent Assistants (CHT)

38

A Single Neuron

z

1w

2w

Nw…

1x

2x

Nx

b

z

bias

y

zez

1

1

1

w, b are the parameters of this neuron

Page 39: Language Empowering Intelligent Assistants (CHT)

39

A Single Neuron

z

1w

2w

Nw…

1x

2x

Nx

bbias

y

1

MN RRf :

5.0"2" 5.0"2"

ynotyis

A single neuron can only handle binary classification

Page 40: Language Empowering Intelligent Assistants (CHT)

40

A Layer of Neurons

• Handwriting digit classification

MN RRf :

A layer of neurons can handle multiple possible output,and the result depends on the max one

1x

2x

Nx

1

1y

……“1” or not

“2” or not

“3” or not

2y

3y

10 neurons/10 classes

Which one is the max?

Page 41: Language Empowering Intelligent Assistants (CHT)

41

Neural Networks – Multi-Layer Perceptron (MLP)

1a 1z

2z

1x

2x z2a

Hidden Units1 1

y

Page 42: Language Empowering Intelligent Assistants (CHT)

42

• Continuous function w/ 2 layers

• Combine two opposite-facing threshold functions to make a ridge

• Continuous function w/ 3 layers

• Combine two perpendicular ridges to make a bump

Add bumps of various sizes and locations to fit any surface

Expression of MLP

http://aima.eecs.berkeley.edu/slides-pdf/chapter20b.pdf

Multiple layers enrich the model expression, so that the model can approximate more complex functions

Page 43: Language Empowering Intelligent Assistants (CHT)

43

Deep Neural Networks (DNN)

• Fully connected feedforward network

1x

2x

……

Layer 1

……

1y

2y

……

Layer 2…

…Layer L

……

……

……

Input Output

MyNx

vector x

vector y

MN RRf :

Deep NN: multiple hidden layers

Page 44: Language Empowering Intelligent Assistants (CHT)

44

Deep Learning for Dia logues

Page 45: Language Empowering Intelligent Assistants (CHT)

45

RNN for SLU• IOB Sequence Labeling for Slot Filling

• Intent Classification

𝑤0 𝑤1 𝑤2 𝑤𝑛

h0𝑓 h1

𝑓 h2𝑓 h𝑛

𝑓

h0𝑏 h1

𝑏 h2𝑏 h𝑛

𝑏

𝑦 0 𝑦 1 𝑦 2 𝑦 𝑛

(a) LSTM (b) LSTM-LA (c) bLSTM-LA

(d) Intent LSTM

intent

𝑤0 𝑤1 𝑤2 𝑤𝑛

h0 h1 h2 h𝑛

𝑦 0 𝑦 1 𝑦 2 𝑦 𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛

h0 h1 h2 h𝑛

𝑦 0 𝑦 1 𝑦 2 𝑦 𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛

h0 h1 h2 h𝑛

Page 46: Language Empowering Intelligent Assistants (CHT)

46

RNN for SLU• Joint Multi-Domain Intent Prediction and Slot Filling

– Information can mutually enhanced

semantic frame sequence

ht-1 ht+1htW W W W

taiwanese

B-type

U

food

U

please

U

VO

VO

V

hT+1

EOS

U

FIND_RESTV

Slot Tagging Intent Prediction

Hakkani-Tur, et al., “Multi-Domain Joint Semantic Frame Parsing using Bi-directional RNN-LSTM,” in Interspeech, 2016.

Page 47: Language Empowering Intelligent Assistants (CHT)

47

just sent email to bob about fishing this weekend

O O O OB-contact_name

OB-subject I-subject I-subject

U

S

I send_emailD communication

send_email(contact_name=“bob”, subject=“fishing this weekend”)

are we going to fish this weekend

U1

S2 send_email(message=“are we going to fish this weekend”)

send email to bob

U2

send_email(contact_name=“bob”)

B-messageI-message

I-message I-message I-messageI-message I-message

B-contact_nameS1

Single Turn

Multi-Turn

Domain Identification Intent Prediction Slot Filling

Contextual SLU (Chen et al., 2016)

Page 48: Language Empowering Intelligent Assistants (CHT)

48

u

Knowledge Attention Distributionpi

mi

Memory Representation

Weighted Sum h

∑ Wkg

oKnowledge Encoding

Representationhistory utterances {xi}

current utterance

c

Inner Product

Sentence EncoderRNNin

x1 x2 xi…

Contextual Sentence Encoder

x1 x2 xi…

RNNmem

slot tagging sequence y

ht-1 ht

V V

W W W

wt-1 wt

yt-1 yt

U U

RNN Tagger

M M

Chen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.

1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding

Contextual SLU (Chen et al., 2016)

Idea: additionally incorporating contextual knowledge during slot tagging track dialogue states in a latent way

Page 49: Language Empowering Intelligent Assistants (CHT)

49

End-to-End Supervised Dialogue System

Wen, et al., “A Network-based End-to-End Trainable Task-Oriented Dialogue System,” arXiv.:1604.04562v2.

Page 50: Language Empowering Intelligent Assistants (CHT)

50

InfoBot:End-to-End Dialogue System with Supervised & Reinforcement Learning

Movie=?; Actor=Bill Murray; Release Year=1993

Find me the Bill Murray’s movie.

I think it came out in 1993.

When was it released?

Groundhog Day is a Bill Murray movie which came out in 1993.

KB-InfoBotUser

(Groundhog Day, actor, Bill Murray)(Groundhog Day, release year, 1993)(Australia, actor, Nicole Kidman)(Mad Max: Fury Road, release year, 2015)

Knowledge Base (head, relation, tail)

Dhingra, et al., “End-to-End Reinforcement Learning of Dialogue Agents for Information Access,” arXiv.:1609.00777v2.

Page 51: Language Empowering Intelligent Assistants (CHT)

51

E2E RL Task-Completion Bot

wi

B-type

wi

+1

wi+2

O O

EOS

<intent>

wi

B-type

wi

+1

wi+2

O O

EOS

<intent> Semantic Frame

request_moviegenre=action, date=this weekend

System Action/Policyrequest_location

User Dialogue ActionInform(location=Bellevue)

Time t-1

wi

<slot>

wi+1 wi+2

O O

EOS

<intent>

Language Understanding (LU)

Time t-2

Time t

Dialogue Management

(DM)

w0 w1w2

Natural Language Generation (NLG)EOSUser

Goal

User Agenda Modeling

User Simulator (Li et al., 2016)End-to-End Neural Dialogue System

Text InputAre there any action movies to see this weekend?

Page 52: Language Empowering Intelligent Assistants (CHT)

52

E2E RL Task-Completion Bot• Simulated User

– Generate interactions based on a predefined fake goal

– Automatically learn strategy by training on the simulated dataUser goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle.

RULE BASED SYSTEMUser: Can I get 2 tickets for the witch?Agent: Which theater do you want?User: Regal meridian 16Agent: What time would you like to see it?User: 9:30 pmAgent: What date would you like to see it?User: Tomorrow!Agent: How many people are you?User: Just two adults.Agent: What is your location?User: Seattle!Agent: What kind of video format do you like?User: Oh, any kind is okay. Agent: Do you have any price constraint?User: No, I don’t care.Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.User: Thank you.Agent: Thank you!

REINFORCEMENT LEARNING SYSTEMUser: Can I get 2 tickets for the witch?Agent: What date would you like to see it?User: TomorrowAgent: Which theater do you prefer?User: Regal meridian 16Agent: What is your location?User: SeattleAgent: What time would you like to see it?User: 9:30 pm.Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.User: Thanks.Agent: Thanks!

Page 53: Language Empowering Intelligent Assistants (CHT)

53

OutlineIntroduction

Intelligent Assistants 智慧助理Mobile Service 行動客服Dialogue System/Bot 對話系統 /機器人

FrameworkLanguage Understanding 語意理解Dialogue Management 對話管理Output Generation 輸出生成

Recent TrendIndustrial Trend and Challenge

Deep Learning Basics

Deep Learning for Dialogues

Conclusion

Page 54: Language Empowering Intelligent Assistants (CHT)

54

Conclusion• The conversational bots can help users manage information access and

finish tasks via spoken interactions– More natural– More convenient– More efficient– User-centered

• Future Vision– Not only single-turn requests but also multi-turn conversations– Not only simple transactions but also complicated ones

• Dialogues can span on multiple domains (e.g. check remaining data and then apply for more data)

• NN-Based Dialogue System– Pipeline outputs are represented as vectors distributional– The execution is constrained by backend services symbolic

Page 55: Language Empowering Intelligent Assistants (CHT)

55

Q & AT H A N K S F O R Y O U R AT T E N T I O N !