partially observable markov decision processes for spoken dialog systems

Download Partially observable Markov decision processes for spoken dialog systems

If you can't read please download the document

Upload: martin-majlis

Post on 16-Apr-2017

1.431 views

Category:

Technology


0 download

TRANSCRIPT

Partially observable Markov decision processes for spoken dialog systems

Jason D. Williams, Steve Young (AT&T Labs)

2007, Computer Speech and Language, 21(2):

Outline

Introduction

Partially observable Markov decision processes

Spoken Dialog System

SDS-POMDP

Comparing

Empirical support

POMDP (1)

Partially observable Markov decision processes

POMDP = {S, A, T, R, O, Z, , b0}S set of states describing agent's world

A set of actions, that agent may take

T transition probability P(s'|s, a)

R reward r(s, a)

O set of observation about the world

Z observation probability P(o'|s', a)

POMDP (2)

POMDP = {S, A, T, R, O, Z, , b0} geometric discount factor

b0 initial belief state b0(s)

POMDP (3)

- random variable

- decision node

- utility node

Shaded unobserved

| - causal effect

- distribution is used

RL reinforced learning

POMDP (Example)

Dialog systemsaving/deleting messages

Spoken Dialog System

Su internal user state

Sd dialog state (user view)

Au user action (intention)

Spoken Dialog System

Yu user audio signal

Au action recognized by machine

C confidence score

Sm dialog state (machine view)

~

Spoken Dialog System

Am machine action

Ym machine audio signal

Am action recognized by user

~

Mapping SDS to POMDP

POMDP = {S, A, T, R, O, Z, , b0}

SDS = {Su, Sd, Sm, C, Au, Au, Am}

~

SDS-POMDP

s = (su, au, sd)

sm = b(s) = b(su, au, sd)

Math behind

Formula for new belief

Exact algorithms rarely scale with more than 10 actions, states and observations.

Effective approximate solutions exist.

Comparing SDS-POMDP

Better than current approaches

CA are simplification or special case

ApproachesParallel state hypotheses

Local use of confidence score

Automated action planning

Parallel state hypotheses

Traditional = 1 state

Uncertainty multiple states

2 techniquesGreedy decision theoretic approaches

M-Best list

Greedy decisions

Maximizes immediate reward

Doesn't perform plan

Handcrafting + ad hoc tunning

M-Best list

Considers only the top hypotheses

= POMDP with handcrafted action selection

Subspace of belief space

Local use of confidence score

Handcrafted update rules

Ac = {expl-confirm, imp-confirm, reject}

Useful, but hard for long-term goals

Automated action selection

Handcrafted planningUnforseen dialog situations

POMDP with single state

2 main techniquesSupervised learning

Markov decision processes

Supervised learning

Training dataHuman-human much richer

Human-machine machine errors

Single state

Markov decision process

Fully Observable MDP is simplification of PO

Assumes, that world state is known exactly

Single state

Empirical support

Based on simulations

Benefits of POMDP toParallel state hypotheses

Confidence score

Automated planning

Real data

Parallel state hypotheses (1)

Parallel state hypotheses (2)

Parallel state hypotheses (3)

Confidence score (1)

Confidence Score: Reject, 0.4, Low, 0.8, Hight

Confidence score (2)

Confidence score (3)

Confidence score (4)

Automated planning (1)

HC1

HC2

HC3

Automated planning (2)

Automated planning (3)

Real data (1)

SACTI-1 Corpus144 human-human dialogs in the travel domain

Real data (2)

Conclusion

Significant improvement in robustness

CA are simplification or special case

Scales purely

Unique

Future work

Other approachesInformation State Update

Hidden Information State

Evaluating on real users

Questions?

Thank you!

Thank you!

Muokkaa otsikon tekstimuotoa napsauttamalla

Muokkaa jsennyksen tekstimuotoa napsauttamallaToinen jsennystasoKolmas jsennystasoNeljs jsennystasoViides jsennystasoKuudes jsennystasoSeitsems jsennystasoKahdeksas jsennystasoYhdekss jsennystaso