big picture: human-robot decision dynamics stochastic ...vaibhav/talks/2012c.pdf · vaibhav...
TRANSCRIPT
Stochastic Search and Surveillance Strategiesfor Mixed Human-Robot Teams
Vaibhav Srivastava
Department of Mechanical Engineering
University of California Santa Barbara
October 31, 2012
PhD Dissertation DefenseVaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 1 / 38
Big Picture: Human-robot decision dynamics
Uncertain environment surveyed by human-UAV team
(Courtesy: Prof. Kristi Morgansen)
UCSB Camera Network
UAV surveillance (Courtesy: http://www.modsim.org/)
A surveillance operator (Courtesy: http://www.modsim.org/)
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 2 / 38
Information Overload
Reprints
This copy is for your personal, noncommercial use only. You can order presentation-ready copies for distributionto your colleagues, clients or customers here or use the "Reprints" tool that appears next to any article. Visitwww.nytreprints.com for samples and additional information. Order a reprint of this article now.
January 16, 2011
In New Military, Data Overload Can BeDeadlyBy THOM SHANKER and MATT RICHTEL
When military investigators looked into an attack by American helicopters last Februarythat left 23 Afghan civilians dead, they found that the operator of a Predator drone hadfailed to pass along crucial information about the makeup of a gathering crowd of villagers.
But Air Force and Army officials now say there was also an underlying cause for thatmistake: information overload.
At an Air Force base in Nevada, the drone operator and his team struggled to work out whatwas happening in the village, where a convoy was forming. They had to monitor the drone’svideo feeds while participating in dozens of instant-message and radio exchanges withintelligence analysts and troops on the ground.
There were solid reports that the group included children, but the team did not adequatelyfocus on them amid the swirl of data — much like a cubicle worker who loses track of animportant e-mail under the mounting pile. The team was under intense pressure to protectAmerican forces nearby, and in the end it determined, incorrectly, that the villagers’ convoyposed an imminent threat, resulting in one of the worst losses of civilian lives in the war inAfghanistan.
“Information overload — an accurate description,” said one senior military officer, who wasbriefed on the inquiry and spoke on the condition of anonymity because the case might yetresult in a court martial. The deaths would have been prevented, he said, “if we had justslowed things down and thought deliberately.”
Data is among the most potent weapons of the 21st century. Unprecedented amounts of rawinformation help the military determine what targets to hit and what to avoid. Anddrone-based sensors have given rise to a new class of wired warriors who must filter theinformation sea. But sometimes they are drowning.
Military Struggles to Harness a Flood of Data - NYTimes.com http://www.nytimes.com/2011/01/17/technology/17brain.html?...
1 of 4 4/24/11 7:21 PM
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 3 / 38
Publications
Search and Surveillance
V. Srivastava, K. Plarre, and F. Bullo. Randomized sensor selection in sequential hypothesis testing.IEEE Trans Signal Processing, 59(5):2342–2354, 2011
V. Srivastava, F. Pasqualetti, and F. Bullo. Stochastic surveillance strategies for spatial quickestdetection. Int J Robotic Research, 2013. to appear
Attention AllocationV. Srivastava, R. Carli, C. Langbort, and F. Bullo. Attention allocation for decision making queues.Automatica, February 2012. conditionally accepted
V. Srivastava and F. Bullo. Knapsack problems with sigmoid utility: Approximation algorithms viahybrid optimization. European Journal of Operational Research, October 2012. Submitted
Other Topics
V. Srivastava, J. Moehlis, and F. Bullo. On bifurcations in nonlinear consensus networks. Journalof Nonlinear Science, 21(6):875–895, 2011
L. Carlone, V. Srivastava, F. Bullo, and G. C. Calafiore. Distributed random convex programmingvia constraints consensus. SIAM J Ctrl Optm, July 2012. Submitted
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 4 / 38
Mixed Team Setup
Cognition & AutonomyManagement System
λincoming tasks
queue lengthn
Vehicle RoutingAlgorithm
Decision SupportSystem
Anomaly DetectionAlgorithm
optimal allocations
tasks &
exogenou
sfactors
situational awareness
fatigue & sleep cycle
forgetting
boredom
AutonomyCognition
decision on tasks
distribution of tasks outgoing tasks
region selection policy
human operatorperformance
- Information aggregation: sensor selection policy/ vehicle routing policy
- Information processing: human attention allocation policy
- Mission goal: efficient search / surveillance
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 5 / 38
Mixed Team Setup
Cognition & AutonomyManagement System
λincoming tasks
queue lengthn
distribution of tasks
region selection policy
Vehicle RoutingAlgorithm
Decision SupportSystem
Anomaly DetectionAlgorithm
optimal allocations
tasks &
decision on tasks
human operator performance
exog
enou
sfactors
situational awareness
fatigue & sleep cycle
forgetting
boredom
outgoing tasks
AutonomyCognition
- Information aggregation: sensor selection policy/ vehicle routing policy
- Information processing: human attention allocation policy
- Mission goal: efficient search / surveillance
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 6 / 38
Incomplete Literature Review
Vehicle Routing for Information Gathering
D. J. Klein, J. Schweikl, J. T. Isaacs, and J. P. Hespanha. On UAV routing protocols for sparsesensor data exfiltration. In Proc ACC, pages 6494–6500, Baltimore, MD, USA, June 2010
V. Gupta, T. H. Chung, B. Hassibi, and R. M. Murray. On a stochastic sensor selection algorithmwith applications in sensor scheduling and sensor coverage. Automatica, 42(2):251–260, 2006
G. A. Hollinger, U. Mitra, and G. S. Sukhatme. Autonomous data collection from underwater sensornetworks using acoustic communication. In Proc IROS, pages 3564–3570, San Francisco, CA, USA,September 2011
Stochastic Surveillance and Pursuit EvasionJ. P. Hespanha, H. J. Kim, and S. S. Sastry. Multiple-agent probabilistic pursuit-evasion games. InProc CDC, pages 2432–2437, Phoenix, AZ, USA, December 1999
J. Grace and J. Baillieul. Stochastic strategies for autonomous robotic surveillance. In Proc CDC-ECC, pages 2200–2205, Seville, Spain, December 2005
K. Srivastava, D. M. Stipanovic, and M. W. Spong. On a stochastic robotic surveillance problem.In Proc CDC, pages 8567–8574, Shanghai, China, December 2009
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 7 / 38
Outline
1 Introduction
2 Stochastic Surveillance StrategiesSingle Vehicle PoliciesMultiple Vehicle Policies
3 Attention Allocation for human operatorTime Constrained Static QueueDynamic Queue with Latency Penalty
4 Mixed Team Surveillance
5 Conclusions
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 7 / 38
Stochastic Surveillance: Problem Setup
a UAV surveys n regions
Objective: quickly detect anomalies
processing time at region k : Tk
distance between region i and j : dij
observations at each region: i.i.d.
pdf of nominal & anomalousobservation at region k : f 0k & f 1k
AnomalyDetectionAlgorithm
VehicleRoutingAlgorithm
Decision
AnomalyLikelihood
Control Center
Observations Collected by UAVs
Vehicle Routing Policy
Surveillance Setup
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 8 / 38
Cumulative Sum Algorithm
standard distribution sampled from distribution f 0
anomalous distribution sampled from distribution f 1
Given a bound on false alarm rate CUSUM algorithmdetects the change in minimum expected time
CUSUM Procedure
1 set statistic Λ = 0
2 collect an observation y
3 update statistic
Λ = max�0,Λ+ log
f 1(y)
f 0(y)
�
4 if Λ > η: declare anomalydetected
5 else go to step 2.
E. S. Page. Continuous inspection schemes. Biometrika, 41(1/2):100–115, 1954Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 9 / 38
Proposed Policy: Randomized ensemble CUSUM algorithm
1 Anomaly detection algorithm:
n parallel CUSUM algorithms (one for each region)
2 Vehicle routing policy:
at each iteration sample region to visit from a probability distribution
AnomalyDetectionAlgorithm
VehicleRoutingAlgorithm
Decision
AnomalyLikelihood
Control Center
Observations Collected by UAVs
Vehicle Routing Policy
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 10 / 38
Randomized Ensemble CUSUM Algorithm
1 n parallel CUSUM algorithms (one for each region)
2 region k visited with stationary prob qk
3 KL divergence at region k : Dk = Ef 1k
�log(f 1k (Y )/f 0k (Y ))
�
Expected detection delay at region k
E[δk(q)]= e−η+η−1
qkDk
� n�
i=1
n�
j=1
qiqj(Ti + dij)�
1 2 3 4 5 6 7 8 9 100
200
400
600
800
1000
Threshold
Expe
cted
det
ectio
n de
lay
Threshold
Exp
ectedDetection
Delay
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 11 / 38
Optimal Stationary Policy
E[δk(q)] = η
qkDk(q · T+ q · Dq) and πk : prior for anomaly at region k
Optimal stationary policy
q∗ = argminq∈∆n
n�
k=1
πkE[δk(q)]
Chernoff bound based guaranteesthat only one minima exists q1
q 2
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
q1
q2
UCSB Campus Optimal Stationary Policy
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 12 / 38
Optimal Stationary Policy
E[δk(q)] = η
qkDk(q · T+ q · Dq) and πk : prior for anomaly at region k
Optimal stationary policy
q∗ = argminq∈∆n
n�
k=1
πkE[δk(q)]
Chernoff bound based guaranteesthat only one minima exists q1
q 2
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
q1
q2
UCSB Campus Optimal Stationary Policy
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 12 / 38
Efficient Stationary Policy
Upper bound on performance:
n�
k=1
πkE[δk(q)] ≤n�
k=1
πk η
qkDk(Tmax + dmax).
Tmax = maxk
Tk , and dmax = maxi
maxj
dij
Efficient Stationary policy
Minimizer of upper bound
q†k =
�πk/Dk�n
j=1
�πj/Dj
Tmin = mink
Tk , Dmin = mink
Dk ,
and Dmax = maxk
Dk
Factor of optimality
Tmax + dmax
Tmin
, w.r.t. stationary policy
nTmax + dmax
Tmin
Dmax
Dmin
, w.r.t. any policy
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 13 / 38
Efficient Stationary Policy
Upper bound on performance:
n�
k=1
πkE[δk(q)] ≤n�
k=1
πk η
qkDk(Tmax + dmax).
Tmax = maxk
Tk , and dmax = maxi
maxj
dij
Efficient Stationary policy
Minimizer of upper bound
q†k =
�πk/Dk�n
j=1
�πj/Dj
Tmin = mink
Tk , Dmin = mink
Dk ,
and Dmax = maxk
Dk
Factor of optimality
Tmax + dmax
Tmin
, w.r.t. stationary policy
nTmax + dmax
Tmin
Dmax
Dmin
, w.r.t. any policy
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 13 / 38
Efficient Stationary Policy
Upper bound on performance:
n�
k=1
πkE[δk(q)] ≤n�
k=1
πk η
qkDk(Tmax + dmax).
Tmax = maxk
Tk , and dmax = maxi
maxj
dij
Efficient Stationary policy
Minimizer of upper bound
q†k =
�πk/Dk�n
j=1
�πj/Dj
Tmin = mink
Tk , Dmin = mink
Dk ,
and Dmax = maxk
Dk
Factor of optimality
Tmax + dmax
Tmin
, w.r.t. stationary policy
nTmax + dmax
Tmin
Dmax
Dmin
, w.r.t. any policy
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 13 / 38
Adaptive Ensemble CUSUM
Adaptive policy1 at each iteration: update prior πk ∝ Λk
2 adapt the efficient stationary policy: q†k =
�πk/Dk�n
j=1
�πj/Dj
3 visit a region, and update CUSUM statistic
Performance of adaptive policy
E[δk(a)] ≤� ηDk
+2(n − 1)eη/2
√Dk(1− e−η/2
)√Dmin(1− e−Dk/2)
+(n − 1)
2eηDk(1− e−η)
Dmin(1− e−Dk )
�(Tmax+dmax).
Delay versus CUSUM Threshold
Comparison with stationary policy
Delay versus Divergence
Comparison with stationary policy
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 14 / 38
Adaptive Ensemble CUSUM
Adaptive policy1 at each iteration: update prior πk ∝ Λk
2 adapt the efficient stationary policy: q†k =
�πk/Dk�n
j=1
�πj/Dj
3 visit a region, and update CUSUM statistic
Performance of adaptive policy
E[δk(a)] ≤� ηDk
+2(n − 1)eη/2
√Dk(1− e−η/2
)√Dmin(1− e−Dk/2)
+(n − 1)
2eηDk(1− e−η)
Dmin(1− e−Dk )
�(Tmax+dmax).
Delay versus CUSUM Threshold
Comparison with stationary policy
Delay versus Divergence
Comparison with stationary policy
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 14 / 38
Adaptive Ensemble CUSUM
Adaptive policy1 at each iteration: update prior πk ∝ Λk
2 adapt the efficient stationary policy: q†k =
�πk/Dk�n
j=1
�πj/Dj
3 visit a region, and update CUSUM statistic
Performance of adaptive policy
E[δk(a)] ≤� ηDk
+2(n − 1)eη/2
√Dk(1− e−η/2
)√Dmin(1− e−Dk/2)
+(n − 1)
2eηDk(1− e−η)
Dmin(1− e−Dk )
�(Tmax+dmax).
1 2 3 4 5 6 7 8 9 100
100
200
300
400
500
600
Threshold
Expe
cted
det
ectio
n de
lay
Exp
ectedDetection
Delay
Threshold
Delay versus CUSUM Threshold
Comparison with stationary policy
0 20 40 60 80 100 120 140 160 180 2000
1
2
3
4
5 x 104
(K L Divergence) 1
Expe
cted
det
ectio
n de
lay
(K-L Divergence)−1
Exp
ectedDetection
Delay
Delay versus Divergence
Comparison with stationary policy
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 14 / 38
Performance of Adaptive Policy
0 100 200 300 400 500 6000.1
0.2
0.3
0.4
0.5
Time
Routing p
olic
y
0 100 200 300 400 500 6000
1
2
3
4
5
6
Time
CU
SU
M S
tatistic
Time
Time
RoutingPolicy
CUSUM
Statistic
Adaptive policy with no anomaly
0 100 200 300 400 500 6000
0.2
0.4
0.6
0.8
Time
Routing p
olic
y
0 100 200 300 400 500 6000
2
4
6
8
Time
CU
SU
M s
atistic
Time
Time
RoutingPolicy
CUSUM
Statistic
Adaptive policies with anomalies
- frequent false alarms at low thresholds
- adaptive policy visits anomalous regions with higher probability
- adaptive policy very effective for high thresholds
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 15 / 38
Outline
1 Introduction
2 Stochastic Surveillance StrategiesSingle Vehicle PoliciesMultiple Vehicle Policies
3 Attention Allocation for human operatorTime Constrained Static QueueDynamic Queue with Latency Penalty
4 Mixed Team Surveillance
5 Conclusions
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 15 / 38
Extension to Multiple Vehicles
m identical vehicles simultaneously surveying the regions
Partitioning Policy1 m-partition regions with each partition having at most �n/m� regions
2 allocate one vehicle to each partition
3 implement single vehicle policy in each partition
Stationary policy with partitioning
Factor of optimality
4πmax
πmin
(Tmax + dmax)
Tmin
Dmax
Dmin
, w.r.t. stat. policy
m2
� n
m
� (Tmax + dmax)
Tmin
Dmax
Dmin
, w.r.t. any policy
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 16 / 38
Extension to Multiple Vehicles
m identical vehicles simultaneously surveying the regions
Partitioning Policy1 m-partition regions with each partition having at most �n/m� regions
2 allocate one vehicle to each partition
3 implement single vehicle policy in each partition
Stationary policy with partitioning
Factor of optimality
4πmax
πmin
(Tmax + dmax)
Tmin
Dmax
Dmin
, w.r.t. stat. policy
m2
� n
m
� (Tmax + dmax)
Tmin
Dmax
Dmin
, w.r.t. any policy
1 2 3 4 5 6 7 8 9 100
100
200
300
400
500
600
Threshold
Avg
De
t D
ela
y
Threshold
AverageDetectionDelay
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 16 / 38
Further Relaxations
Not all-to-all topology1 Construct a Markov chain with
desired stationary distribution
Dependent Observations1 Use CUSUM like algorithm for
HMMs (Chen and Willet ’00)
Dependence across Regions1 More information available, can
be used to improve performance
More than one kind of anomaly1 Use Generalized likelihood ratio
2 Side product: type of anomalyVaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 17 / 38
Further Relaxations
Not all-to-all topology1 Construct a Markov chain with
desired stationary distribution
Dependent Observations1 Use CUSUM like algorithm for
HMMs (Chen and Willet ’00)
Dependence across Regions1 More information available, can
be used to improve performance
More than one kind of anomaly1 Use Generalized likelihood ratio
2 Side product: type of anomaly
0 50 100 150 200 250 300 350 400 450 500 5500
0.2
0.4
0.6
0.8
1
Time
Rou
ting
Pol
icy
0 50 100 150 200 250 300 350 400 450 500 5500
5
10
15
Time
GLR
Sta
tistic
Time
Time
RoutingPolicy
GLR
Statistic
1 2 3 4 5 6 7 80
0.2
0.4
0.6
0.8
1
Hypothesis
Pos
terio
r Pro
babi
lity
Hypothesis
Norm
alizedLikelihood
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 17 / 38
Outline
1 Introduction
2 Stochastic Surveillance StrategiesSingle Vehicle PoliciesMultiple Vehicle Policies
3 Attention Allocation for human operatorTime Constrained Static QueueDynamic Queue with Latency Penalty
4 Mixed Team Surveillance
5 Conclusions
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 18 / 38
Mixed Team Setup
Cognition & AutonomyManagement System
λincoming tasks
queue lengthn
distribution of tasks
region selection policy
Vehicle RoutingAlgorithm
Decision SupportSystem
Anomaly DetectionAlgorithm
optimal allocations
tasks &
decision on tasks
human operator performance
exogenou
sfactors
situational awareness
fatigue & sleep cycle
forgetting
boredom
outgoing tasks
AutonomyCognition
- Information aggregation: sensor selection policy/ vehicle routing policy
- Information processing: human attention allocation policy
- Mission goal: efficient search / surveillance
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 18 / 38
Incomplete Literature Review
Human Decision MakingR. Bogacz, E. Brown, J. Moehlis, P. Holmes, and J. D. Cohen. The physics of optimal decisionmaking: A formal analysis of performance in two-alternative forced choice tasks. PsychologicalReview, 113(4):700–765, 2006
R. W. Pew. The speed-accuracy operating characteristic. Acta Psychologica, 30:16–26, 1969
Control of QueuesO. Hernandez-Lerma and S. I. Marcus. Adaptive control of service in queueing systems. IFAC Syst& Control L, 3(5):283–289, 1983
S. Agrali and J. Geunes. Solving knapsack problems with S-curve return functions. European Journalof Operational Research, 193(2):605–615, 2009
Human-in-the-loop ControlK. Savla and E. Frazzoli. A dynamical queue approach to intelligent task management for humanoperators. IEEE Proceedings, 100(3):672–686, 2012
L. F. Bertuccelli, N. Pellegrino, and M. L. Cummings. Choice modeling of relook tasks for UAVsearch missions. In Proc ACC, pages 2410–2415, Baltimore, MD, USA, June 2010
N. D. Powel and K. A. Morgansen. Multiserver queueing for supervisory control of autonomousvehicles. In Proc ACC, pages 3179–3185, Montreal, Canada, June 2012
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 19 / 38
Physics of Human Decision Making
Human Decision Making
Evolution of evidence for decision
Time
Time
tinf
tmin tmax
CorrectDecision
Probability
Probability of correct decision
1 Evidence for decision making evolves as a drift-diffusion process2 Probability of correct decision evolves as a sigmoid function
Sigmoid performance also occurs in
1. Human-machine communication2. Advertising response3. Bidding in simultaneous auctions4. Human assisted multiple target search
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 20 / 38
Physics of Human Decision Making
Human Decision Making
Evolution of evidence for decision
Time
Time
tinf
tmin tmax
CorrectDecision
Probability
Probability of correct decision
1 Evidence for decision making evolves as a drift-diffusion process2 Probability of correct decision evolves as a sigmoid function
Sigmoid performance also occurs in
1. Human-machine communication2. Advertising response3. Bidding in simultaneous auctions4. Human assisted multiple target search
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 20 / 38
Attention Allocation for Human Operator
Problem: How to optimally allocate operator attention to a batch oftasks or to an incoming stream of tasks
– The performance of operator evolves as sigmoid function
– Static queue: serve N tasks in time T
– Dynamic queue: tasks arrive continuously at some known rate
– Optimal design of queue: What is an optimal arrival rateVaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 21 / 38
Outline
1 Introduction
2 Stochastic Surveillance StrategiesSingle Vehicle PoliciesMultiple Vehicle Policies
3 Attention Allocation for human operatorTime Constrained Static QueueDynamic Queue with Latency Penalty
4 Mixed Team Surveillance
5 Conclusions
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 21 / 38
Knapsack Problem with Sigmoid Utility
Human operator to perform N surveillance tasks in time T
Expected reward for allocation t� to task � is f�(t�)
Find allocation that maximizes total expected reward
maximize f1(t1) + · · ·+ fN(tN)
subject to t1 + · · ·+ tN = T
t� ≥ 0, � ∈ {1, . . . ,N}Courtesy: Wikipedia
– knapsack problem: f� is step function
– If f� are sigmoid functions: decision variables are hybrid
– knapsack problem with sigmoid utility is NP hard
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 22 / 38
Standard Approach for Sigmoid Functions
Construct a concave envelop
Time
Time
tinf
tmin tmax
Correct
Decision
Probab
ility
Concave envelop may yield a very bad policy
Example: Identical Sigmoid Functions
maximizet�≥0
10�
�=1
1/(1 + exp(−t� + 5))
subject to t1 + . . .+ t10 = 8.
Optimal policy: t∗1= 8, t∗
2= . . . = t∗
10= 0 Reward: 0.9526
Concave envelop policy : t1 = t2 = . . . = t10 = 0.8 Reward: 0.1477
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 23 / 38
Standard Approach for Sigmoid Functions
Construct a concave envelop
Time
Time
tinf
tmin tmax
Correct
Decision
Probab
ility
Concave envelop may yield a very bad policy
Example: Identical Sigmoid Functions
maximizet�≥0
10�
�=1
1/(1 + exp(−t� + 5))
subject to t1 + . . .+ t10 = 8.
Optimal policy: t∗1= 8, t∗
2= . . . = t∗
10= 0 Reward: 0.9526
Concave envelop policy : t1 = t2 = . . . = t10 = 0.8 Reward: 0.1477
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 23 / 38
Sigmoid Function and Linear Penalty
Sigmoid function and linear penalty
maximizet≥0
f (t)− ψt
Time
Time
tinf
tmin tmax
Correct
Decision
Probab
ility
Derivative of a sigmoid function
00 Penalty Rate
Optimal
Allocation
ψf
Optimal allocation v/s penalty rate
–The optimal allocation jumps down to zero at critical penalty rate
– Jump creates combinatorial effects
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 24 / 38
KP with Sigmoid Utility: Approximation Algorithm I
KP with Sigmoid Utilitymaximize f1(t1) + · · ·+ fN(tN)
subject to t1 + · · ·+ tN = T
t� ≥ 0, � ∈ {1, . . . ,N}
Lagrangian
L(t,α) =N�
�=1
(f�(t�)− αt�)
α parametrized non-zero allocations
t†� = f †� (α) ≡�max{t | f �� (t) = α}, if y ∈ range(f �� ),
0, otherwise.
Allocations at boundary: t∗� ∈ {0,T}α-parametrized knapsack problem
maximize x1f1(t†1) + · · ·+ fN(t
†N)
subject to x1t†1+ · · ·+ xNt
†N = T
x� ∈ {0, 1}, � ∈ {1, . . . ,N}
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 25 / 38
KP with Sigmoid Utility: Approximation Algorithm I
KP with Sigmoid Utilitymaximize f1(t1) + · · ·+ fN(tN)
subject to t1 + · · ·+ tN = T
t� ≥ 0, � ∈ {1, . . . ,N}
Lagrangian
L(t,α) =N�
�=1
(f�(t�)− αt�)
α parametrized non-zero allocations
t†� = f †� (α) ≡�max{t | f �� (t) = α}, if y ∈ range(f �� ),
0, otherwise.
Allocations at boundary: t∗� ∈ {0,T}
α-parametrized knapsack problem
maximize x1f1(t†1) + · · ·+ fN(t
†N)
subject to x1t†1+ · · ·+ xNt
†N = T
x� ∈ {0, 1}, � ∈ {1, . . . ,N}
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 25 / 38
KP with Sigmoid Utility: Approximation Algorithm I
KP with Sigmoid Utilitymaximize f1(t1) + · · ·+ fN(tN)
subject to t1 + · · ·+ tN = T
t� ≥ 0, � ∈ {1, . . . ,N}
Lagrangian
L(t,α) =N�
�=1
(f�(t�)− αt�)
α parametrized non-zero allocations
t†� = f †� (α) ≡�max{t | f �� (t) = α}, if y ∈ range(f �� ),
0, otherwise.
Allocations at boundary: t∗� ∈ {0,T}α-parametrized knapsack problem
maximize x1f1(t†1) + · · ·+ fN(t
†N)
subject to x1t†1+ · · ·+ xNt
†N = T
x� ∈ {0, 1}, � ∈ {1, . . . ,N}
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 25 / 38
KP with Sigmoid Utility: Approximation Algorithm II
2-factor approximation algorithm
1: Parametrize via Lagrange multiplier
t†� = f †� (α)
2: Solve α-parametrized relaxed knapsack
maximize x1f1(t†1) + · · ·+ xN fN(t
†N)
subject to x1t†1+ · · ·+ xNt
†N ≤ T
x� ∈ [0, 1], � ∈ {1, . . . ,N}
3: Search optimal Lagrange multiplier α
4: Serve tasks with x∗� = 1
5: Compare the reward with f�(T ), ∀�
6: Pick the better policy
α-parametrized knapsack
Optimal allocations
Approx allocations
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 26 / 38
KP with Sigmoid Utility: Approximation Algorithm II
2-factor approximation algorithm
1: Parametrize via Lagrange multiplier
t†� = f †� (α)
2: Solve α-parametrized relaxed knapsack
maximize x1f1(t†1) + · · ·+ xN fN(t
†N)
subject to x1t†1+ · · ·+ xNt
†N ≤ T
x� ∈ [0, 1], � ∈ {1, . . . ,N}
3: Search optimal Lagrange multiplier α
4: Serve tasks with x∗� = 1
5: Compare the reward with f�(T ), ∀�
6: Pick the better policy
Lagrange Multiplier αMax
ObjectiveFunction
α-parametrized knapsack
2 4 6 8 101 3 5 7 90
3
6
Opt
imal
allo
cOptimal
Allocation
ApproxAllocation
Task
Task1
1
5
52
2
3
3 4
4
6
6 7
7
8
8
9
9
10
10
3
3
6
6
0
0
Optimal allocations
2 4 6 8 101 3 5 7 90
3
6
Apro
x. a
lloc
Optimal
Allocation
ApproxAllocation
Task
Task1
1
5
52
2
3
3 4
4
6
6 7
7
8
8
9
9
10
10
3
3
6
6
0
0
Approx allocations
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 26 / 38
Outline
1 Introduction
2 Stochastic Surveillance StrategiesSingle Vehicle PoliciesMultiple Vehicle Policies
3 Attention Allocation for human operatorTime Constrained Static QueueDynamic Queue with Latency Penalty
4 Mixed Team Surveillance
5 Conclusions
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 26 / 38
Decision Making Queue with Penalty
Tasks arrive as a Poisson process with rate λ
Tasks sampled from a distribution p : D → R≥0
Reward wd for each correct decision on task d
Latency penalty per unit-time cd , for task d ∈ D, and c = Ep[cd ]
Objective of task release algorithm:
maxt1,t2,t3...
limL→∞
1
L
L�
�=1
E�wd� fd�(t�)−
1
2
� �+n�−1�
i=�
cdi +
�+n�+1−1�
j=�
cdj
�t��
where queue length n�+1 = max{1, n� − 1 + Poisson(λt�)}
Approach: Certainty-equivalent receding horizon optimization
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 27 / 38
Certainty-equivalent receding horizon optimization
Certainty-equivalent approximation:replace future uncertainties with their nominal values
CE queue length: n�+1 = max{1, n� − 1 + λt�}
CE performance function: f (t�) =
�d∈D wdpd fd(t�)�
d∈D wdpd
Finite horizon optimization problem for task �
maximumt1,...,tN
n��
j=1
�wj fj(tj)−
� n��
i=j
ci + (nj − n� − j + 1)c�tj −
1
2cλt2j
�
+N�
j=n�+1
�w f (tj)− c nj tj −
1
2cλt2j
�
– Univariate DP with continuous action and state variables !
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 28 / 38
Certainty-equivalent receding horizon optimization
Certainty-equivalent approximation:replace future uncertainties with their nominal values
CE queue length: n�+1 = max{1, n� − 1 + λt�}
CE performance function: f (t�) =
�d∈D wdpd fd(t�)�
d∈D wdpd
Finite horizon optimization problem for task �
maximumt1,...,tN
n��
j=1
�wj fj(tj)−
� n��
i=j
ci + (nj − n� − j + 1)c�tj −
1
2cλt2j
�
+N�
j=n�+1
�w f (tj)− c nj tj −
1
2cλt2j
�
– Univariate DP with continuous action and state variables !
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 28 / 38
Certainty-equivalent receding horizon optimization
Certainty-equivalent approximation:replace future uncertainties with their nominal values
CE queue length: n�+1 = max{1, n� − 1 + λt�}
CE performance function: f (t�) =
�d∈D wdpd fd(t�)�
d∈D wdpd
Finite horizon optimization problem for task �
maximumt1,...,tN
n��
j=1
�wj fj(tj)−
� n��
i=j
ci + (nj − n� − j + 1)c�tj −
1
2cλt2j
�
+N�
j=n�+1
�w f (tj)− c nj tj −
1
2cλt2j
�
– Univariate DP with continuous action and state variables !Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 28 / 38
Numerical Illustration
0 5 10 15 20 250
2
4
6
2
0
4
5 10 15 20 250Task
Allocation 6
Allocations
0 5 10 15 20 250
2
4
6
8
20
4
5 10 15 20 250Task
Queu
elength
68
Queue Length
0 5 10 15 20 250
2
4
6
2
0
4
5 10 15 20 250Task
Inflection
Point
6
Difficulty of tasks
0 5 10 15 20 250
1
2
1
0
2
5 10 15 20 250Task
Weigh
t
Importance of tasks
– Difficult and unimportant tasks are dropped
– Tasks dropped at high queue lengths
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 29 / 38
Experimental Validation
task = spot the differences
expected # detected differencesis linear function of time (DDM)
probability to detect more than 60% diffsis sigmoid (threshold-based decision making)
0 10 20 30 40 50 601
1.5
2
2.5
3
3.5
4
4.5
5
Info
rmat
ion
Aggr
egat
ed
Time
Information aggregation satisfy DDM
0 10 20 30 40 50 600
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Expe
cted
Rew
ard
Time
Probability of correct decision is sigmoid
Acknowledgment: Christopher J. HoThanks to Volunteers: Fabio, Anahita, Florian, Rush, and John
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 30 / 38
Outline
1 Introduction
2 Stochastic Surveillance StrategiesSingle Vehicle PoliciesMultiple Vehicle Policies
3 Attention Allocation for human operatorTime Constrained Static QueueDynamic Queue with Latency Penalty
4 Mixed Team Surveillance
5 Conclusions
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 30 / 38
Mixed Team Setup
Cognition & AutonomyManagement System
λincoming tasks
queue lengthn
Vehicle RoutingAlgorithm
Decision SupportSystem
Anomaly DetectionAlgorithm
optimal allocations
tasks &
exogenou
sfactors
situational awareness
fatigue & sleep cycle
forgetting
boredom
AutonomyCognition
decision on tasks
distribution of tasks outgoing tasks
region selection policy
human operatorperformance
Critical Issues:1 no sensor observations for surveillance
2 operator’s decision: binary random variable
3 sequence of decisions: dependent and non-identically distributed
4 standard CUSUM not applicable
5 performance function on a task varies throughout mission
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 31 / 38
Mixed Team Surveillance
Cognition & AutonomyManagement System
λincoming tasks
queue lengthn
Vehicle RoutingAlgorithm
Decision SupportSystem
Anomaly DetectionAlgorithm
optimal allocations
tasks &
exogenou
sfactors
situational awareness
fatigue & sleep cycle
forgetting
boredom
AutonomyCognition
decision on tasks
distribution of tasks outgoing tasks
region selection policy
human operatorperformance
Good News:1 CUSUM like algorithm applicable for dependent data
2 performance function varies but can be characterized
Bad News:1 No detection delay expressions
Simplified routing policy:
Region selection probability ∝ likelihood of anomaly
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 32 / 38
Mixed Team Surveillance
Cognition & AutonomyManagement System
λincoming tasks
queue lengthn
Vehicle RoutingAlgorithm
Decision SupportSystem
Anomaly DetectionAlgorithm
optimal allocations
tasks &
exogenou
sfactors
situational awareness
fatigue & sleep cycle
forgetting
boredom
AutonomyCognition
decision on tasks
distribution of tasks outgoing tasks
region selection policy
human operatorperformance
Good News:1 CUSUM like algorithm applicable for dependent data
2 performance function varies but can be characterized
Bad News:1 No detection delay expressions
Simplified routing policy:
Region selection probability ∝ likelihood of anomalyVaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 32 / 38
Operator performance in Surveillance Mission
Drift-diffusion model:
dx(t) = µdt + σdW (t),
x(0) =µ
2σ2log
π
1− π
π: operator’s prior belief on anomaly
Threshold
Eviden
ceEvo
lution
Time
Performance function: π�1− Φ
�−µt − x0σ√t
��+ (1− π)
�Φ�µt − x0
σ√t
��
Φ(·): standard normal cdf
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 33 / 38
Operator performance in Surveillance Mission
Drift-diffusion model:
dx(t) = µdt + σdW (t),
x(0) =µ
2σ2log
π
1− π
π: operator’s prior belief on anomaly
Threshold
Eviden
ceEvo
lution
Time
Performance function: π�1− Φ
�−µt − x0σ√t
��+ (1− π)
�Φ�µt − x0
σ√t
��
Φ(·): standard normal cdf
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 33 / 38
Mixed Team Surveillance: Update Rules
Prior Belief Update Rule
πnew =πPanom(dec�|t)
(1− π)Pno-anom(dec�|t) + πPanom(dec�|t)
CUSUM like update rule
Λ�+1 = max�0,Λ� + log
Pno-anom(dec�|t�, dec�−1, t�−1, . . .)
Panom(dec�|t�, dec�−1, t�−1, . . .)
�
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 34 / 38
Mixed Team Surveillance: Update Rules
Prior Belief Update Rule
πnew =πPanom(dec�|t)
(1− π)Pno-anom(dec�|t) + πPanom(dec�|t)
CUSUM like update rule
Λ�+1 = max�0,Λ� + log
Pno-anom(dec�|t�, dec�−1, t�−1, . . .)
Panom(dec�|t�, dec�−1, t�−1, . . .)
�
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 34 / 38
Mixed Team Surveillance: Numerical Illustration
0 20 40 60 80 1000
10
20
30
40
0
10
20
30
40 60 80 100
2
4
6
0
0
20
40
0 20 40 60 80 100Task
Task
Allocation
Queuelength Allocations
0 20 40 60 80 1000
2
4
6
0
10
20
30
40 60 80 100
2
4
6
0
0
20
40
0 20 40 60 80 100Task
Task
Allocation
Queuelength
Queue Length
0 500 1000 15000
1
0
500 1000 15000
500 1000 15000
0
1
2
4
6
8
Time
Time
CUSUM
Statistics
Reg.Select.
Prob.
Region Selection Probability
0 500 1000 15000
2
4
6
8
0
500 1000 15000
500 1000 15000
0
1
2
4
6
8
Time
Time
CUSUM
Statistics
Reg.Select.
Prob.
CUSUM Statistics
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 35 / 38
Outline
1 Introduction
2 Stochastic Surveillance StrategiesSingle Vehicle PoliciesMultiple Vehicle Policies
3 Attention Allocation for human operatorTime Constrained Static QueueDynamic Queue with Latency Penalty
4 Mixed Team Surveillance
5 Conclusions
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 35 / 38
Conclusions
Stochastic Surveillance
Surveillance for anomaly detection
Ensemble CUSUM algorithm with stochastic routing policy
Surv. policy depends on geography, difficulty, & anom likelihood
Attention Allocation
Decision making performance = speed/accuracy trade-off
Sigmoid performance renders combinatorial effects
Blend of combinatorial and convex optimization
Optimal policies drop tasks for static as well as dynamic problems
Mixed Team Surveillance
Time-varying operator performance
CUSUM like algorithm for anomaly detectionVaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 36 / 38
Future Directions
Stochastic Surveillance
More efficient partitioning policies
Maximum entropy Markov chain
Inter-region dynamics of anomalies
Adversarial anomalies
Attention Allocation
More efficient methods of incorporating deadlines
Experimental validation
Mixed Team Surveillance
Incorporating exogenous factors into decision making models
Real-time adaptation of parameters, e.g., by introducing control tasks
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 37 / 38
References
Search and Surveillance StrategiesV. Srivastava, F. Pasqualetti, and F. Bullo. Stochastic surveillance strategies for spatialquickest detection. Int J Robotic Research, 2013. to appearV. Srivastava, K. Plarre, and F. Bullo. Randomized sensor selection in sequential hypothesistesting. IEEE Trans Signal Processing, 59(5):2342–2354, 2011
Attention Allocation StrategiesV. Srivastava, R. Carli, C. Langbort, and F. Bullo. Attention allocation for decision makingqueues. Automatica, February 2012. conditionally acceptedV. Srivastava and F. Bullo. Knapsack problems with sigmoid utility: Approximation algo-rithms via hybrid optimization. European Journal of Operational Research, October 2012.Submitted
Mixed Team SurveillanceV. Srivastava, A. Surana, M. Eckstein, and F. Bullo. Mixed human-robot team surveillancewith guaranteed performance. 2012. In preparation.
Funding: AFOSR MURI Program “Behavioral Dynamics in MixedHuman/Robotics Teams” 5/07-6/12
Vaibhav Srivastava (UCSB) Mixed Team Surveillance October 31, 2012 38 / 38