big picture: human-robot decision dynamics stochastic ...vaibhav/talks/2012c.pdf · vaibhav...

Stochastic Search and Surveillance Strategiesfor Mixed Human-Robot Teams

Vaibhav Srivastava

Department of Mechanical Engineering

University of California Santa Barbara

October 31, 2012

PhD Dissertation DefenseVaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 1 / 38

Big Picture: Human-robot decision dynamics

Uncertain environment surveyed by human-UAV team

(Courtesy: Prof. Kristi Morgansen)

UCSB Camera Network

UAV surveillance (Courtesy: http://www.modsim.org/)

A surveillance operator (Courtesy: http://www.modsim.org/)

Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 2 / 38

Information Overload

Reprints

This copy is for your personal, noncommercial use only. You can order presentation-ready copies for distributionto your colleagues, clients or customers here or use the "Reprints" tool that appears next to any article. Visitwww.nytreprints.com for samples and additional information. Order a reprint of this article now.

January 16, 2011

In New Military, Data Overload Can BeDeadlyBy THOM SHANKER and MATT RICHTEL

When military investigators looked into an attack by American helicopters last Februarythat left 23 Afghan civilians dead, they found that the operator of a Predator drone hadfailed to pass along crucial information about the makeup of a gathering crowd of villagers.

But Air Force and Army officials now say there was also an underlying cause for thatmistake: information overload.

At an Air Force base in Nevada, the drone operator and his team struggled to work out whatwas happening in the village, where a convoy was forming. They had to monitor the drone’svideo feeds while participating in dozens of instant-message and radio exchanges withintelligence analysts and troops on the ground.

There were solid reports that the group included children, but the team did not adequatelyfocus on them amid the swirl of data — much like a cubicle worker who loses track of animportant e-mail under the mounting pile. The team was under intense pressure to protectAmerican forces nearby, and in the end it determined, incorrectly, that the villagers’ convoyposed an imminent threat, resulting in one of the worst losses of civilian lives in the war inAfghanistan.

“Information overload — an accurate description,” said one senior military officer, who wasbriefed on the inquiry and spoke on the condition of anonymity because the case might yetresult in a court martial. The deaths would have been prevented, he said, “if we had justslowed things down and thought deliberately.”

Data is among the most potent weapons of the 21st century. Unprecedented amounts of rawinformation help the military determine what targets to hit and what to avoid. Anddrone-based sensors have given rise to a new class of wired warriors who must filter theinformation sea. But sometimes they are drowning.

Military Struggles to Harness a Flood of Data - NYTimes.com http://www.nytimes.com/2011/01/17/technology/17brain.html?...

1 of 4 4/24/11 7:21 PM


Publications

Search and Surveillance

V. Srivastava, K. Plarre, and F. Bullo. Randomized sensor selection in sequential hypothesis testing.IEEE Trans Signal Processing, 59(5):2342–2354, 2011

V. Srivastava, F. Pasqualetti, and F. Bullo. Stochastic surveillance strategies for spatial quickestdetection. Int J Robotic Research, 2013. to appear

Attention AllocationV. Srivastava, R. Carli, C. Langbort, and F. Bullo. Attention allocation for decision making queues.Automatica, February 2012. conditionally accepted

V. Srivastava and F. Bullo. Knapsack problems with sigmoid utility: Approximation algorithms viahybrid optimization. European Journal of Operational Research, October 2012. Submitted

Other Topics

V. Srivastava, J. Moehlis, and F. Bullo. On bifurcations in nonlinear consensus networks. Journalof Nonlinear Science, 21(6):875–895, 2011

L. Carlone, V. Srivastava, F. Bullo, and G. C. Calafiore. Distributed random convex programmingvia constraints consensus. SIAM J Ctrl Optm, July 2012. Submitted


Mixed Team Setup

Cognition & AutonomyManagement System

λincoming tasks

queue lengthn

Vehicle RoutingAlgorithm

Decision SupportSystem

Anomaly DetectionAlgorithm

optimal allocations

tasks &

exogenou

sfactors

situational awareness

fatigue & sleep cycle

forgetting

boredom

AutonomyCognition

decision on tasks

distribution of tasks outgoing tasks

region selection policy

human operatorperformance

- Information aggregation: sensor selection policy/ vehicle routing policy

- Information processing: human attention allocation policy

- Mission goal: efficient search / surveillance


Mixed Team Setup


λincoming tasks

queue lengthn

distribution of tasks





optimal allocations

tasks &

decision on tasks

human operator performance

exog

enou

sfactors



forgetting

boredom

outgoing tasks

AutonomyCognition





Incomplete Literature Review

Vehicle Routing for Information Gathering

D. J. Klein, J. Schweikl, J. T. Isaacs, and J. P. Hespanha. On UAV routing protocols for sparsesensor data exfiltration. In Proc ACC, pages 6494–6500, Baltimore, MD, USA, June 2010

V. Gupta, T. H. Chung, B. Hassibi, and R. M. Murray. On a stochastic sensor selection algorithmwith applications in sensor scheduling and sensor coverage. Automatica, 42(2):251–260, 2006

G. A. Hollinger, U. Mitra, and G. S. Sukhatme. Autonomous data collection from underwater sensornetworks using acoustic communication. In Proc IROS, pages 3564–3570, San Francisco, CA, USA,September 2011

Stochastic Surveillance and Pursuit EvasionJ. P. Hespanha, H. J. Kim, and S. S. Sastry. Multiple-agent probabilistic pursuit-evasion games. InProc CDC, pages 2432–2437, Phoenix, AZ, USA, December 1999

J. Grace and J. Baillieul. Stochastic strategies for autonomous robotic surveillance. In Proc CDC-ECC, pages 2200–2205, Seville, Spain, December 2005

K. Srivastava, D. M. Stipanovic, and M. W. Spong. On a stochastic robotic surveillance problem.In Proc CDC, pages 8567–8574, Shanghai, China, December 2009


Outline

1 Introduction

2 Stochastic Surveillance StrategiesSingle Vehicle PoliciesMultiple Vehicle Policies

3 Attention Allocation for human operatorTime Constrained Static QueueDynamic Queue with Latency Penalty

4 Mixed Team Surveillance

5 Conclusions


Stochastic Surveillance: Problem Setup

a UAV surveys n regions

Objective: quickly detect anomalies

processing time at region k : Tk

distance between region i and j : dij

observations at each region: i.i.d.

pdf of nominal & anomalousobservation at region k : f 0k & f 1k

AnomalyDetectionAlgorithm

VehicleRoutingAlgorithm

Decision

AnomalyLikelihood

Control Center

Observations Collected by UAVs

Vehicle Routing Policy

Surveillance Setup


Cumulative Sum Algorithm

standard distribution sampled from distribution f 0

anomalous distribution sampled from distribution f 1

Given a bound on false alarm rate CUSUM algorithmdetects the change in minimum expected time

CUSUM Procedure

1 set statistic Λ = 0

2 collect an observation y

3 update statistic

Λ = max�0,Λ+ log

f 1(y)

f 0(y)

�

4 if Λ > η: declare anomalydetected

5 else go to step 2.

E. S. Page. Continuous inspection schemes. Biometrika, 41(1/2):100–115, 1954Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 9 / 38

Proposed Policy: Randomized ensemble CUSUM algorithm

1 Anomaly detection algorithm:

n parallel CUSUM algorithms (one for each region)

2 Vehicle routing policy:

at each iteration sample region to visit from a probability distribution

AnomalyDetectionAlgorithm

VehicleRoutingAlgorithm

Decision

AnomalyLikelihood

Control Center

Observations Collected by UAVs

Vehicle Routing Policy


Randomized Ensemble CUSUM Algorithm

1 n parallel CUSUM algorithms (one for each region)

2 region k visited with stationary prob qk

3 KL divergence at region k : Dk = Ef 1k

�log(f 1k (Y )/f 0k (Y ))

�

Expected detection delay at region k

E[δk(q)]= e−η+η−1

qkDk

� n�

i=1

n�

j=1

qiqj(Ti + dij)�

1 2 3 4 5 6 7 8 9 100

200

400

600

800

1000

Threshold

Expe

cted

det

ectio

n de

lay

Threshold

Exp

ectedDetection

Delay


Optimal Stationary Policy

E[δk(q)] = η

qkDk(q · T+ q · Dq) and πk : prior for anomaly at region k

Optimal stationary policy

q∗ = argminq∈∆n

n�

k=1

πkE[δk(q)]

Chernoff bound based guaranteesthat only one minima exists q1

q 2

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

q1

q2

UCSB Campus Optimal Stationary Policy


Optimal Stationary Policy

E[δk(q)] = η

qkDk(q · T+ q · Dq) and πk : prior for anomaly at region k

Optimal stationary policy

q∗ = argminq∈∆n

n�

k=1

πkE[δk(q)]

Chernoff bound based guaranteesthat only one minima exists q1

q 2

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

q1

q2

UCSB Campus Optimal Stationary Policy


Efficient Stationary Policy

Upper bound on performance:

n�

k=1

πkE[δk(q)] ≤n�

k=1

πk η

qkDk(Tmax + dmax).

Tmax = maxk

Tk , and dmax = maxi

maxj

dij

Efficient Stationary policy

Minimizer of upper bound

q†k =

�πk/Dk�n

j=1

�πj/Dj

Tmin = mink

Tk , Dmin = mink

Dk ,

and Dmax = maxk

Dk

Factor of optimality

Tmax + dmax

Tmin

, w.r.t. stationary policy

nTmax + dmax

Tmin

Dmax

Dmin

, w.r.t. any policy




n�

k=1


k=1

πk η

qkDk(Tmax + dmax).

Tmax = maxk


maxj

dij



q†k =

�πk/Dk�n

j=1

�πj/Dj

Tmin = mink

Tk , Dmin = mink

Dk ,

and Dmax = maxk

Dk


Tmax + dmax

Tmin


nTmax + dmax

Tmin

Dmax

Dmin

, w.r.t. any policy




n�

k=1


k=1

πk η

qkDk(Tmax + dmax).

Tmax = maxk


maxj

dij



q†k =

�πk/Dk�n

j=1

�πj/Dj

Tmin = mink

Tk , Dmin = mink

Dk ,

and Dmax = maxk

Dk


Tmax + dmax

Tmin


nTmax + dmax

Tmin

Dmax

Dmin

, w.r.t. any policy


Adaptive Ensemble CUSUM

Adaptive policy1 at each iteration: update prior πk ∝ Λk

2 adapt the efficient stationary policy: q†k =

�πk/Dk�n

j=1

�πj/Dj

3 visit a region, and update CUSUM statistic

Performance of adaptive policy

E[δk(a)] ≤� ηDk

+2(n − 1)eη/2

√Dk(1− e−η/2

)√Dmin(1− e−Dk/2)

+(n − 1)

2eηDk(1− e−η)

Dmin(1− e−Dk )

�(Tmax+dmax).

Delay versus CUSUM Threshold

Comparison with stationary policy

Delay versus Divergence






�πk/Dk�n

j=1

�πj/Dj




+2(n − 1)eη/2

√Dk(1− e−η/2


+(n − 1)

2eηDk(1− e−η)

Dmin(1− e−Dk )

�(Tmax+dmax).









�πk/Dk�n

j=1

�πj/Dj




+2(n − 1)eη/2

√Dk(1− e−η/2


+(n − 1)

2eηDk(1− e−η)

Dmin(1− e−Dk )

�(Tmax+dmax).

1 2 3 4 5 6 7 8 9 100

100

200

300

400

500

600

Threshold

Expe

cted

det

ectio

n de

lay

Exp

ectedDetection

Delay

Threshold



0 20 40 60 80 100 120 140 160 180 2000

1

2

3

4

5 x 104

(K L Divergence) 1

Expe

cted

det

ectio

n de

lay

(K-L Divergence)−1

Exp

ectedDetection

Delay




Performance of Adaptive Policy

0 100 200 300 400 500 6000.1

0.2

0.3

0.4

0.5

Time

Routing p

olic

y

0 100 200 300 400 500 6000

1

2

3

4

5

6

Time

CU

SU

M S

tatistic

Time

Time

RoutingPolicy

CUSUM

Statistic

Adaptive policy with no anomaly

0 100 200 300 400 500 6000

0.2

0.4

0.6

0.8

Time

Routing p

olic

y

0 100 200 300 400 500 6000

2

4

6

8

Time

CU

SU

M s

atistic

Time

Time

RoutingPolicy

CUSUM

Statistic

Adaptive policies with anomalies

- frequent false alarms at low thresholds

- adaptive policy visits anomalous regions with higher probability

- adaptive policy very effective for high thresholds


Outline

1 Introduction




5 Conclusions


Extension to Multiple Vehicles

m identical vehicles simultaneously surveying the regions

Partitioning Policy1 m-partition regions with each partition having at most �n/m� regions

2 allocate one vehicle to each partition

3 implement single vehicle policy in each partition

Stationary policy with partitioning


4πmax

πmin

(Tmax + dmax)

Tmin

Dmax

Dmin

, w.r.t. stat. policy

m2

� n

m

� (Tmax + dmax)

Tmin

Dmax

Dmin

, w.r.t. any policy


Extension to Multiple Vehicles

m identical vehicles simultaneously surveying the regions

Partitioning Policy1 m-partition regions with each partition having at most �n/m� regions

2 allocate one vehicle to each partition

3 implement single vehicle policy in each partition

Stationary policy with partitioning


4πmax

πmin

(Tmax + dmax)

Tmin

Dmax

Dmin

, w.r.t. stat. policy

m2

� n

m

� (Tmax + dmax)

Tmin

Dmax

Dmin

, w.r.t. any policy

1 2 3 4 5 6 7 8 9 100

100

200

300

400

500

600

Threshold

Avg

De

t D

ela

y

Threshold

AverageDetectionDelay


Further Relaxations

Not all-to-all topology1 Construct a Markov chain with

desired stationary distribution

Dependent Observations1 Use CUSUM like algorithm for

HMMs (Chen and Willet ’00)

Dependence across Regions1 More information available, can

be used to improve performance

More than one kind of anomaly1 Use Generalized likelihood ratio

2 Side product: type of anomalyVaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 17 / 38

Further Relaxations

Not all-to-all topology1 Construct a Markov chain with

desired stationary distribution

Dependent Observations1 Use CUSUM like algorithm for

HMMs (Chen and Willet ’00)

Dependence across Regions1 More information available, can

be used to improve performance

More than one kind of anomaly1 Use Generalized likelihood ratio

2 Side product: type of anomaly

0 50 100 150 200 250 300 350 400 450 500 5500

0.2

0.4

0.6

0.8

1

Time

Rou

ting

Pol

icy

0 50 100 150 200 250 300 350 400 450 500 5500

5

10

15

Time

GLR

Sta

tistic

Time

Time

RoutingPolicy

GLR

Statistic

1 2 3 4 5 6 7 80

0.2

0.4

0.6

0.8

1

Hypothesis

Pos

terio

r Pro

babi

lity

Hypothesis

Norm

alizedLikelihood


Outline

1 Introduction




5 Conclusions


Mixed Team Setup


λincoming tasks

queue lengthn

distribution of tasks





optimal allocations

tasks &

decision on tasks

human operator performance

exogenou

sfactors



forgetting

boredom

outgoing tasks

AutonomyCognition





Incomplete Literature Review

Human Decision MakingR. Bogacz, E. Brown, J. Moehlis, P. Holmes, and J. D. Cohen. The physics of optimal decisionmaking: A formal analysis of performance in two-alternative forced choice tasks. PsychologicalReview, 113(4):700–765, 2006

R. W. Pew. The speed-accuracy operating characteristic. Acta Psychologica, 30:16–26, 1969

Control of QueuesO. Hernandez-Lerma and S. I. Marcus. Adaptive control of service in queueing systems. IFAC Syst& Control L, 3(5):283–289, 1983

S. Agrali and J. Geunes. Solving knapsack problems with S-curve return functions. European Journalof Operational Research, 193(2):605–615, 2009

Human-in-the-loop ControlK. Savla and E. Frazzoli. A dynamical queue approach to intelligent task management for humanoperators. IEEE Proceedings, 100(3):672–686, 2012

L. F. Bertuccelli, N. Pellegrino, and M. L. Cummings. Choice modeling of relook tasks for UAVsearch missions. In Proc ACC, pages 2410–2415, Baltimore, MD, USA, June 2010

N. D. Powel and K. A. Morgansen. Multiserver queueing for supervisory control of autonomousvehicles. In Proc ACC, pages 3179–3185, Montreal, Canada, June 2012


Physics of Human Decision Making

Human Decision Making

Evolution of evidence for decision

Time

Time

tinf

tmin tmax

CorrectDecision

Probability

Probability of correct decision

1 Evidence for decision making evolves as a drift-diffusion process2 Probability of correct decision evolves as a sigmoid function

Sigmoid performance also occurs in

1. Human-machine communication2. Advertising response3. Bidding in simultaneous auctions4. Human assisted multiple target search


Physics of Human Decision Making

Human Decision Making

Evolution of evidence for decision

Time

Time

tinf

tmin tmax

CorrectDecision

Probability

Probability of correct decision

1 Evidence for decision making evolves as a drift-diffusion process2 Probability of correct decision evolves as a sigmoid function

Sigmoid performance also occurs in

1. Human-machine communication2. Advertising response3. Bidding in simultaneous auctions4. Human assisted multiple target search


Attention Allocation for Human Operator

Problem: How to optimally allocate operator attention to a batch oftasks or to an incoming stream of tasks

– The performance of operator evolves as sigmoid function

– Static queue: serve N tasks in time T

– Dynamic queue: tasks arrive continuously at some known rate

– Optimal design of queue: What is an optimal arrival rateVaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 21 / 38

Outline

1 Introduction




5 Conclusions


Knapsack Problem with Sigmoid Utility

Human operator to perform N surveillance tasks in time T

Expected reward for allocation t� to task � is f�(t�)

Find allocation that maximizes total expected reward

maximize f1(t1) + · · ·+ fN(tN)

subject to t1 + · · ·+ tN = T

t� ≥ 0, � ∈ {1, . . . ,N}Courtesy: Wikipedia

– knapsack problem: f� is step function

– If f� are sigmoid functions: decision variables are hybrid

– knapsack problem with sigmoid utility is NP hard


Standard Approach for Sigmoid Functions

Construct a concave envelop

Time

Time

tinf

tmin tmax

Correct

Decision

Probab

ility

Concave envelop may yield a very bad policy

Example: Identical Sigmoid Functions

maximizet�≥0

10�

�=1

1/(1 + exp(−t� + 5))

subject to t1 + . . .+ t10 = 8.

Optimal policy: t∗1= 8, t∗

2= . . . = t∗

10= 0 Reward: 0.9526

Concave envelop policy : t1 = t2 = . . . = t10 = 0.8 Reward: 0.1477


Standard Approach for Sigmoid Functions

Construct a concave envelop

Time

Time

tinf

tmin tmax

Correct

Decision

Probab

ility

Concave envelop may yield a very bad policy

Example: Identical Sigmoid Functions

maximizet�≥0

10�

�=1

1/(1 + exp(−t� + 5))

subject to t1 + . . .+ t10 = 8.

Optimal policy: t∗1= 8, t∗

2= . . . = t∗

10= 0 Reward: 0.9526

Concave envelop policy : t1 = t2 = . . . = t10 = 0.8 Reward: 0.1477


Sigmoid Function and Linear Penalty

Sigmoid function and linear penalty

maximizet≥0

f (t)− ψt

Time

Time

tinf

tmin tmax

Correct

Decision

Probab

ility

Derivative of a sigmoid function

00 Penalty Rate

Optimal

Allocation

ψf

Optimal allocation v/s penalty rate

–The optimal allocation jumps down to zero at critical penalty rate

– Jump creates combinatorial effects


KP with Sigmoid Utility: Approximation Algorithm I

KP with Sigmoid Utilitymaximize f1(t1) + · · ·+ fN(tN)


t� ≥ 0, � ∈ {1, . . . ,N}

Lagrangian

L(t,α) =N�

�=1

(f�(t�)− αt�)

α parametrized non-zero allocations

t†� = f †� (α) ≡�max{t | f �� (t) = α}, if y ∈ range(f �� ),

0, otherwise.

Allocations at boundary: t∗� ∈ {0,T}α-parametrized knapsack problem

maximize x1f1(t†1) + · · ·+ fN(t

†N)

subject to x1t†1+ · · ·+ xNt

†N = T

x� ∈ {0, 1}, � ∈ {1, . . . ,N}





t� ≥ 0, � ∈ {1, . . . ,N}

Lagrangian

L(t,α) =N�

�=1

(f�(t�)− αt�)



0, otherwise.

Allocations at boundary: t∗� ∈ {0,T}

α-parametrized knapsack problem


†N)


†N = T

x� ∈ {0, 1}, � ∈ {1, . . . ,N}





t� ≥ 0, � ∈ {1, . . . ,N}

Lagrangian

L(t,α) =N�

�=1

(f�(t�)− αt�)



0, otherwise.

Allocations at boundary: t∗� ∈ {0,T}α-parametrized knapsack problem


†N)


†N = T

x� ∈ {0, 1}, � ∈ {1, . . . ,N}


KP with Sigmoid Utility: Approximation Algorithm II

2-factor approximation algorithm

1: Parametrize via Lagrange multiplier

t†� = f †� (α)

2: Solve α-parametrized relaxed knapsack

maximize x1f1(t†1) + · · ·+ xN fN(t

†N)


†N ≤ T

x� ∈ [0, 1], � ∈ {1, . . . ,N}

3: Search optimal Lagrange multiplier α

4: Serve tasks with x∗� = 1

5: Compare the reward with f�(T ), ∀�

6: Pick the better policy

α-parametrized knapsack

Optimal allocations

Approx allocations


KP with Sigmoid Utility: Approximation Algorithm II

2-factor approximation algorithm

1: Parametrize via Lagrange multiplier

t†� = f †� (α)

2: Solve α-parametrized relaxed knapsack

maximize x1f1(t†1) + · · ·+ xN fN(t

†N)


†N ≤ T

x� ∈ [0, 1], � ∈ {1, . . . ,N}

3: Search optimal Lagrange multiplier α

4: Serve tasks with x∗� = 1

5: Compare the reward with f�(T ), ∀�

6: Pick the better policy

Lagrange Multiplier αMax

ObjectiveFunction

α-parametrized knapsack

2 4 6 8 101 3 5 7 90

3

6

Opt

imal

allo

cOptimal

Allocation

ApproxAllocation

Task

Task1

1

5

52

2

3

3 4

4

6

6 7

7

8

8

9

9

10

10

3

3

6

6

0

0

Optimal allocations

2 4 6 8 101 3 5 7 90

3

6

Apro

x. a

lloc

Optimal

Allocation

ApproxAllocation

Task

Task1

1

5

52

2

3

3 4

4

6

6 7

7

8

8

9

9

10

10

3

3

6

6

0

0

Approx allocations


Outline

1 Introduction




5 Conclusions


Decision Making Queue with Penalty

Tasks arrive as a Poisson process with rate λ

Tasks sampled from a distribution p : D → R≥0

Reward wd for each correct decision on task d

Latency penalty per unit-time cd , for task d ∈ D, and c = Ep[cd ]

Objective of task release algorithm:

maxt1,t2,t3...

limL→∞

1

L

L�

�=1

E�wd� fd�(t�)−

1

2

� �+n�−1�

i=�

cdi +

�+n�+1−1�

j=�

cdj

�t��

where queue length n�+1 = max{1, n� − 1 + Poisson(λt�)}

Approach: Certainty-equivalent receding horizon optimization


Certainty-equivalent receding horizon optimization

Certainty-equivalent approximation:replace future uncertainties with their nominal values

CE queue length: n�+1 = max{1, n� − 1 + λt�}

CE performance function: f (t�) =

�d∈D wdpd fd(t�)�

d∈D wdpd

Finite horizon optimization problem for task �

maximumt1,...,tN

n��

j=1

�wj fj(tj)−

� n��

i=j

ci + (nj − n� − j + 1)c�tj −

1

2cλt2j

�

+N�

j=n�+1

�w f (tj)− c nj tj −

1

2cλt2j

�

– Univariate DP with continuous action and state variables !







d∈D wdpd


maximumt1,...,tN

n��

j=1

�wj fj(tj)−

� n��

i=j

ci + (nj − n� − j + 1)c�tj −

1

2cλt2j

�

+N�

j=n�+1


1

2cλt2j

�

– Univariate DP with continuous action and state variables !







d∈D wdpd


maximumt1,...,tN

n��

j=1

�wj fj(tj)−

� n��

i=j

ci + (nj − n� − j + 1)c�tj −

1

2cλt2j

�

+N�

j=n�+1


1

2cλt2j

�

– Univariate DP with continuous action and state variables !Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 28 / 38

Numerical Illustration

0 5 10 15 20 250

2

4

6

2

0

4

5 10 15 20 250Task

Allocation 6

Allocations

0 5 10 15 20 250

2

4

6

8

20

4

5 10 15 20 250Task

Queu

elength

68

Queue Length

0 5 10 15 20 250

2

4

6

2

0

4

5 10 15 20 250Task

Inflection

Point

6

Difficulty of tasks

0 5 10 15 20 250

1

2

1

0

2

5 10 15 20 250Task

Weigh

t

Importance of tasks

– Difficult and unimportant tasks are dropped

– Tasks dropped at high queue lengths


Experimental Validation

task = spot the differences

expected # detected differencesis linear function of time (DDM)

probability to detect more than 60% diffsis sigmoid (threshold-based decision making)

0 10 20 30 40 50 601

1.5

2

2.5

3

3.5

4

4.5

5

Info

rmat

ion

Aggr

egat

ed

Time

Information aggregation satisfy DDM

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Expe

cted

Rew

ard

Time

Probability of correct decision is sigmoid

Acknowledgment: Christopher J. HoThanks to Volunteers: Fabio, Anahita, Florian, Rush, and John


Outline

1 Introduction




5 Conclusions


Mixed Team Setup


λincoming tasks

queue lengthn




optimal allocations

tasks &

exogenou

sfactors



forgetting

boredom

AutonomyCognition

decision on tasks




Critical Issues:1 no sensor observations for surveillance

2 operator’s decision: binary random variable

3 sequence of decisions: dependent and non-identically distributed

4 standard CUSUM not applicable

5 performance function on a task varies throughout mission


Mixed Team Surveillance


λincoming tasks

queue lengthn




optimal allocations

tasks &

exogenou

sfactors



forgetting

boredom

AutonomyCognition

decision on tasks




Good News:1 CUSUM like algorithm applicable for dependent data

2 performance function varies but can be characterized

Bad News:1 No detection delay expressions

Simplified routing policy:

Region selection probability ∝ likelihood of anomaly




λincoming tasks

queue lengthn




optimal allocations

tasks &

exogenou

sfactors



forgetting

boredom

AutonomyCognition

decision on tasks




Good News:1 CUSUM like algorithm applicable for dependent data

2 performance function varies but can be characterized

Bad News:1 No detection delay expressions

Simplified routing policy:

Region selection probability ∝ likelihood of anomalyVaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 32 / 38

Operator performance in Surveillance Mission

Drift-diffusion model:

dx(t) = µdt + σdW (t),

x(0) =µ

2σ2log

π

1− π

π: operator’s prior belief on anomaly

Threshold

Eviden

ceEvo

lution

Time

Performance function: π�1− Φ

�−µt − x0σ√t

��+ (1− π)

�Φ�µt − x0

σ√t

��

Φ(·): standard normal cdf


Operator performance in Surveillance Mission

Drift-diffusion model:

dx(t) = µdt + σdW (t),

x(0) =µ

2σ2log

π

1− π

π: operator’s prior belief on anomaly

Threshold

Eviden

ceEvo

lution

Time

Performance function: π�1− Φ

�−µt − x0σ√t

��+ (1− π)

�Φ�µt − x0

σ√t

��

Φ(·): standard normal cdf


Mixed Team Surveillance: Update Rules

Prior Belief Update Rule

πnew =πPanom(dec�|t)

(1− π)Pno-anom(dec�|t) + πPanom(dec�|t)

CUSUM like update rule

Λ�+1 = max�0,Λ� + log

Pno-anom(dec�|t�, dec�−1, t�−1, . . .)

Panom(dec�|t�, dec�−1, t�−1, . . .)

�


Mixed Team Surveillance: Update Rules

Prior Belief Update Rule

πnew =πPanom(dec�|t)

(1− π)Pno-anom(dec�|t) + πPanom(dec�|t)

CUSUM like update rule

Λ�+1 = max�0,Λ� + log

Pno-anom(dec�|t�, dec�−1, t�−1, . . .)

Panom(dec�|t�, dec�−1, t�−1, . . .)

�


Mixed Team Surveillance: Numerical Illustration

0 20 40 60 80 1000

10

20

30

40

0

10

20

30

40 60 80 100

2

4

6

0

0

20

40

0 20 40 60 80 100Task

Task

Allocation

Queuelength Allocations

0 20 40 60 80 1000

2

4

6

0

10

20

30

40 60 80 100

2

4

6

0

0

20

40

0 20 40 60 80 100Task

Task

Allocation

Queuelength

Queue Length

0 500 1000 15000

1

0

500 1000 15000

500 1000 15000

0

1

2

4

6

8

Time

Time

CUSUM

Statistics

Reg.Select.

Prob.

Region Selection Probability

0 500 1000 15000

2

4

6

8

0

500 1000 15000

500 1000 15000

0

1

2

4

6

8

Time

Time

CUSUM

Statistics

Reg.Select.

Prob.

CUSUM Statistics


Outline

1 Introduction




5 Conclusions


Conclusions

Stochastic Surveillance

Surveillance for anomaly detection

Ensemble CUSUM algorithm with stochastic routing policy

Surv. policy depends on geography, difficulty, & anom likelihood

Attention Allocation

Decision making performance = speed/accuracy trade-off

Sigmoid performance renders combinatorial effects

Blend of combinatorial and convex optimization

Optimal policies drop tasks for static as well as dynamic problems


Time-varying operator performance

CUSUM like algorithm for anomaly detectionVaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 36 / 38

Future Directions

Stochastic Surveillance

More efficient partitioning policies

Maximum entropy Markov chain

Inter-region dynamics of anomalies

Adversarial anomalies

Attention Allocation

More efficient methods of incorporating deadlines

Experimental validation


Incorporating exogenous factors into decision making models

Real-time adaptation of parameters, e.g., by introducing control tasks


References

Search and Surveillance StrategiesV. Srivastava, F. Pasqualetti, and F. Bullo. Stochastic surveillance strategies for spatialquickest detection. Int J Robotic Research, 2013. to appearV. Srivastava, K. Plarre, and F. Bullo. Randomized sensor selection in sequential hypothesistesting. IEEE Trans Signal Processing, 59(5):2342–2354, 2011

Attention Allocation StrategiesV. Srivastava, R. Carli, C. Langbort, and F. Bullo. Attention allocation for decision makingqueues. Automatica, February 2012. conditionally acceptedV. Srivastava and F. Bullo. Knapsack problems with sigmoid utility: Approximation algo-rithms via hybrid optimization. European Journal of Operational Research, October 2012.Submitted

Mixed Team SurveillanceV. Srivastava, A. Surana, M. Eckstein, and F. Bullo. Mixed human-robot team surveillancewith guaranteed performance. 2012. In preparation.

Funding: AFOSR MURI Program “Behavioral Dynamics in MixedHuman/Robotics Teams” 5/07-6/12


big picture: human-robot decision dynamics stochastic ...vaibhav/talks/2012c.pdf · vaibhav...

Documents