a decision-theoretic model of assistance - evaluation, extension and open problems sriraam...

A Decision-Theoretic Model of A Decision-Theoretic Model of Assistance - Evaluation, Assistance - Evaluation,

Extension and Open ProblemsExtension and Open Problems

Sriraam Natarajan, Kshitij Judah, Prasad Sriraam Natarajan, Kshitij Judah, Prasad Tadepalli and Alan FernTadepalli and Alan Fern

School of EECS, Oregon State University

OutlineOutline

IntroductionIntroduction Decision-Theoretic ModelDecision-Theoretic Model Experiment with folder predictorExperiment with folder predictor Incorporating Relational Incorporating Relational

HierarchiesHierarchies Open ProblemsOpen Problems ConclusionConclusion

MotivationMotivation

Several assistant systems proposed to Several assistant systems proposed to Assist users in daily tasksAssist users in daily tasks Reduce their cognitive loadReduce their cognitive load

Examples: CALO (CALO 2003), COACH Examples: CALO (CALO 2003), COACH (Boger (Boger et al. et al. 2005) etc2005) etc

Problems with previous workProblems with previous work Fine-tuned to particular application domainsFine-tuned to particular application domains Utilize specialized technologiesUtilize specialized technologies Lack an overarching frameworkLack an overarching framework

Interaction ModelInteraction Model

User Assistant

Action set UAction set AGo

al

W2

User Action

W1

Initial State


Assistant

W2

User Action

W4 W5W3

Assistant Actions

W1

Initial State

User Assistant

Goal : Minimize

user’s actions


User Assistant

Goal

W6W2

User Action

W4 W5W3

Assistant Actions

W1

Initial State


User Assistant

Action set A

W6 W7 W8W2

User Action

W4 W5W3

Assistant Actions

W1

Initial State

Goal : Minimize

user’s actions


User Assistant

Thank you

W6 W7 W8 W9

Goal Achieved

W2

User Action

W4 W5W3

Assistant Actions

W1

Initial State

Markov Decision ProcessMarkov Decision Process

MDP – (S,A,T,R,I)MDP – (S,A,T,R,I)

Policy (Policy () – Mapping from S to A) – Mapping from S to A

V(V() = E() = E(ΣΣTTt=1 t=1 rrtt), T = length of episode), T = length of episode

Optimal Policy (Optimal Policy () = argmax (V() = argmax (V()))) A Partially Observable Markov Decision A Partially Observable Markov Decision

Process (POMDP):Process (POMDP): O is the set of observationsO is the set of observations µ(o|s) is a distribution over observations o µ(o|s) is a distribution over observations o єє O O

given current state sgiven current state s

Decision-Theoretic Model (Fern et al. Decision-Theoretic Model (Fern et al. 07)07)

Assistant: History-dependent stochastic policy Assistant: History-dependent stochastic policy ‘(a|w, ‘(a|w, OO))

Observables: World states, Agent’s actionsObservables: World states, Agent’s actions

Hidden: Agent’s goalsHidden: Agent’s goals

Episode begins at state w with goal gEpisode begins at state w with goal g

C(w, g, C(w, g, , , ’): Cost of episode’): Cost of episode

Objective: compute Objective: compute ’ that minimizes E[C(I, G’ that minimizes E[C(I, G00, , , , ’)]’)]

Assistant POMDPAssistant POMDP

Given MDP <W,A,A’,T,C,I>, GGiven MDP <W,A,A’,T,C,I>, G0 0 and and , the , the assistant POMDP is defined as:assistant POMDP is defined as: State space is W State space is W x Gx G Action set is A’Action set is A’ Transition function T’ isTransition function T’ is T’((w,g),a’,(w’,g’)) = 0 if g != g’T’((w,g),a’,(w’,g’)) = 0 if g != g’ = T(w,a’,w’) if a’ != noop= T(w,a’,w’) if a’ != noop = P(T(w, = P(T(w, (w,g)) = w’)(w,g)) = w’) if a’ == noopif a’ == noop Cost model C’ isCost model C’ is C’((w, g), a’) = C(w, a’) if a’ != noopC’((w, g), a’) = C(w, a’) if a’ != noop = E[C(w, a)] where a is distributed = E[C(w, a)] where a is distributed

according to according to

Assistant POMDPAssistant POMDP

AAtt

WWtt

GG

SStt

WWt+1t+1

A’A’ttAAt+1t+1

SSt+1t+1

A’A’t+1t+1

Approximate Solution Approximate Solution ApproachApproach

Goal Recognizer Action Selection

Environment

UserUt

AtOt

P(G)

Assistant

Wt

Online actions selection cycleOnline actions selection cycle1) Estimate posterior goal distribution given 1) Estimate posterior goal distribution given

observationobservation

2) Action selection via myopic heuristics2) Action selection via myopic heuristics

Goal EstimationGoal Estimation

Wt

Current State

P(G | Ot)

Goal posterior given observations up to time t

Wt+1

Ut

P(G | Ot+1)

Updated goal posterior

new observation

GivenGiven P(G | OP(G | Ott) : Goal posterior at time ) : Goal posterior at time tt P(UP(Ut t | G, W| G, Wtt) : User policy) : User policy OOt+1 t+1 : New observation of user action and world state: New observation of user action and world state

must learn user policy

Action Selection: Assistant Action Selection: Assistant POMDPPOMDP

At’

Wt Wt+1 Wt+2

U

G

At’

Wt Wt+2

Assistant MDP

Assume we know the user goal G and policyAssume we know the user goal G and policy Can create a corresponding Can create a corresponding assistant MDPassistant MDP over assistant over assistant

actionsactions Can compute Can compute Q(A, W, G) giving value of taking assistive action A Q(A, W, G) giving value of taking assistive action A

when users goal is Gwhen users goal is G

Select action that maximizes expected (myopic) value: Select action that maximizes expected (myopic) value:

Q(A,W) =P

G P (G j Ot)Q(A;W;G)

Folder PredictorFolder Predictor

Previous work (Bao et al. 2006):Previous work (Bao et al. 2006): No repredictionsNo repredictions Does not consider new foldersDoes not consider new folders

Decision-Theoretic Model Decision-Theoretic Model Naturally handles repredictionsNaturally handles repredictions Considers mixture density to obtain the Considers mixture density to obtain the

distributiondistribution

Data set – set of requests of Data set – set of requests of OpenOpen and and saveAssaveAs

Folder hierarchy – 226 foldersFolder hierarchy – 226 folders Prior distribution initialized according to Prior distribution initialized according to

the model of Bao et al.the model of Bao et al.

P(f) = ¹ 0P0(f ) + (1¡ ¹ 0)Pl(f )

restricted folder set

all foldersconsidered

No Reprediction With Repredictions

1.3724

1.319

1.34

1.2344

Avg. no. of clicks per open/saveAs

Current Tasktracer

Full Assistant Framework

Incorporating Relational Incorporating Relational HierarchiesHierarchies

Tasks are hierarchical Tasks are hierarchical Writing a paper Writing a paper

Tasks have a natural class – subclass hierarchyTasks have a natural class – subclass hierarchy Papers to ICML or IJCAI involve similar subtasksPapers to ICML or IJCAI involve similar subtasks

Tasks are chosen based on some attribute of the Tasks are chosen based on some attribute of the worldworld Grad students work on a paper closer to the deadlineGrad students work on a paper closer to the deadline

Goal: Combine these ideas to Goal: Combine these ideas to Specify prior knowledge easilySpecify prior knowledge easily Accelerate learning of the parametersAccelerate learning of the parameters

Doorman DomainDoorman Domain

L = R.Loc

Gather(R) Attack(E)

Collect(R) Deposit(R,S) DestroyCamp(E)KillDragon(D)

Goto(L)Pickup(R)

Move(X) Open(D)

DropOff(R,S)

R.Type = S.Type

L = S.Loc L = D.Loc

Kill(D)

Destroy(E)

L = E.Loc

E.Type = D.Type

Performance of different Performance of different modelsmodels

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Number of episodes x 10

Sav

ings

Relational Hierarchies

Hierarchical Model

Flat Model

RelationalModel

Open ProblemsOpen Problems Partial Observability of the userPartial Observability of the user

Currently user completely observes the environmentCurrently user completely observes the environment Not the case in real-world – User need not know what Not the case in real-world – User need not know what

is in the refrigeratoris in the refrigerator Assistant can completely observe the world Assistant can completely observe the world Current system does not consider user’s exploratory Current system does not consider user’s exploratory

actionsactions Setting is similar to interactive POMDPs (Doshi et al.)Setting is similar to interactive POMDPs (Doshi et al.) Environment – POMDPEnvironment – POMDP Belief states of the POMDP are belief states of the userBelief states of the POMDP are belief states of the user State space needs to be extended to capture user’s State space needs to be extended to capture user’s

beliefsbeliefs

Open ProblemsOpen Problems Large State spaceLarge State space

Solving POMDP is impracticalSolving POMDP is impractical Kitchen Domain (Fern et al.) – 140000 states Kitchen Domain (Fern et al.) – 140000 states Prune certain regions of the search space Prune certain regions of the search space

((Electric ElvesElectric Elves)) Can use user trajectories as training examplesCan use user trajectories as training examples

Parallel subgoals/actionsParallel subgoals/actions Assistant and user execute actions in parallelAssistant and user execute actions in parallel Useful to execute parallel subgoals - User writes Useful to execute parallel subgoals - User writes

paper, assistant runs experimentspaper, assistant runs experiments Identification of the possible parallel actionsIdentification of the possible parallel actions The assistant can change the goal stack of the The assistant can change the goal stack of the

useruser Goal estimation has to include the user’s responseGoal estimation has to include the user’s response

Open ProblemsOpen Problems Changing goalsChanging goals

User can change goal midway - Work on a different User can change goal midway - Work on a different projectproject

Currently, the system would converge to the goal slowlyCurrently, the system would converge to the goal slowly Explicitly model this possibilityExplicitly model this possibility Borrow ideas from user modeling to predict changing Borrow ideas from user modeling to predict changing

goalsgoals Expanding set of goals Expanding set of goals

A large number of dishes can be cookedA large number of dishes can be cooked Forgetting subgoalsForgetting subgoals

Forgetting to attach a document to the emailForgetting to attach a document to the email Explicitly model this possibility – borrow ideas from Explicitly model this possibility – borrow ideas from

cognitive science literaturecognitive science literature

ConclusionConclusion Propose a general framework based on decision-Propose a general framework based on decision-

theorytheory

Experiments in a real-world domainExperiments in a real-world domain

Repredictions are usefulRepredictions are useful

Currently working on a relational hierarchical Currently working on a relational hierarchical modelmodel

Outlined several open problems Outlined several open problems

Motivated the necessity of using sophisticated Motivated the necessity of using sophisticated user modelsuser models

Thank you!!!Thank you!!!

a decision-theoretic model of assistance - evaluation, extension and open problems sriraam...

Documents

w g slide

t wtwt w t

time t w t

user goal g

time t pu t g

user policy pu t g

o assistant

user policy o t