toward community sensing andreas krause carnegie mellon university joint work with eric horvitz,...
TRANSCRIPT
Toward Community Sensing
Andreas Krause Carnegie Mellon University
Joint work with Eric Horvitz, Aman Kansal, Feng ZhaoMicrosoft Research
Information Processing in Sensor Networks | April 24, 2008
2
Motivation: Traffic monitoring
Deployedsensors,
high accuracyspeed data
What about148th Ave?
How can we get accurate road speed estimates everywhere?
Detector loops
Traffic cameras
3
Cars as traffic sensorsMany cars have Personal Navigation Devices (PNDs)Know exact location and speed!
Fuse GPS, map information, engine speed, …
Modern PNDs have network connection Can use cars as speed sensors!
Example: Dash Express (GPS + GPRS/WiFi)
4
Community Sensing Vision
Realize full potential of population owned sensorsMust respect privacy and preference about sharing!
Privately-heldsensors
Common goal
Estimate spatialphenomenon
(traffic, weather, …)
Construct 3D cities
News coverage
Contributesensor data
Request data
SenseWeb
5
Privacy concern of GPS traces
Dense GPS traces allow to identify people’s locations, activities, intents, etc.Even anonymization or strong obfuscation doesn’t help.Key idea: Avoid dense sampling!
Need to predict from sparse samples
Images courtesy of John Krumm
6
s1 s2 s3
s4
s5s7
s6
s11
s12
s9 s10
s8
Phenomenon modeling
(Normalized) speeds as random variablesJoint distribution allows modeling correlationsCan predict unmonitoredspeeds from monitored speeds using P(S5 | S1, S9)
s1 s3
s12
s9
Which segments should we monitor?
7
Minimizing uncertainty
s1=.9 s2=1 S3=1
s5 s6
s4=1
s7 P(S 5|
s A)
0 1Var(S5|sA)=.01Var(S5|sA)=.1
Var(S5|SA)=
A={S1,S2,S3,S4}s1=.5 s2=.6 s3=.8
s4=.6.08
Var(S6|SA)=
.1
Var(S7|SA)=
.3
s1 s2 s3
s4
s1 s2 s3
s4
Can estimate prediction error at segment Si
Var(Si | SA = sA)
Expected error at segment Si
Expected mean squared errorEMSE(A) = i Var(Si | SA) = + +
A* = argmin|A|· k EMSE(A)
Does not take “importance” of Si into account
Frequentlytravelled
Lesstravelled
8
Taking demand into accountModel demand Di as random variables (e.g., Poisson)E.g., Di = #cars on segment Si
Demand weighted MSEDMSE(A) = i E[Di] Var(Si | SA)
Error reduction: R(A) = DMSE(;)-DMSE(A)
Want: A* = argmax|A|· k R(A)
NP-hard optimization problem
s1 s3
s4
Var(S5|SA)=
.08Var(S6|SA)=
.1
Var(S7|SA)=
.3
50D5 =
s2
s510D6 =
200D7 =
= ¢ ¢ ¢+ + s6
s7
9
Selecting informative locationsGreedy algorithm:
A ;For i = 1:k do
s*= argmaxs R(A [ {s})
A A [ {s*}
How well does this heuristic do?
s1 s2 s3
s4
s5
s7
s6
s11
s12
s9 s10
s8
s2
s11
s7
s10
10
s1 s2 s3
s4
s5
s7
s6
s11
s9 s10
s8
Selection B
Diminishing returns
s1 s2 s3
s4
s5
s7
s6
s11
s9 s10
s8
s’Observe new locationS’B A
S’
+
+
Large improvement
Small improvement
Submodularity:
For A µ B, F(A [ {S’}) – F(A) ¸ F(B [ {S’}) – F(B)
Utility R(A) is submodular*!
*See store for details
Selection A
Adding s’ helps a lot! Adding s’ doesn’t help much
11
Why is submodularity is useful?
Theorem [Nemhauser et al ‘78]Greedy algorithm gives constant factor approximation
F(Agreedy) ¸ (1-1/e) F(Aopt)
Greedy algorithm gives near-optimal set of locations to observe
Have no control over where the sensors (cars, cell phones) are going to be!
~63%
12
Querying a roving sensor
How can we cope with uncertain sensor availability?
s1 s3
s6
s4
s7
s2
s5
Query!Response: “I’m at S2,
going 55 mph”
Query!No response
(no data)
s2=.9
13
Road segmentsV = {S1,…,Sn}
Random A µ Vfrom P(A | B)
Modeling sensor availabilitySet W of observations (cars) we can select fromIf select car Cj, observe Si with probability P(i | Cj)
s1 s3
s6
s4
s7
s2
s5
C1
C2
C3
ObservationsW = {C1,…,Cm}
Pick B µ W
Utility R(A)
s1
s7
Goal: Maximize expected utility:
B* = argmax|B|· k A P(Aj B) R(A)
14
Optimizing community sensingLemma: Whenever R(A) is submodular, the function
F(B) = |A|· k P(A j B) R(A) is submodular
Can use the greedy algorithm to optimize selection F(B) is sum over exponentially many terms
Theorem: For any , can find set B’ such that
F(B’) ¸ (1-1/e) max|B|· k F(B) -
with probability 1-, using independent samples of R(A)
15
Handling user preferencesNeed to respect user preferences
“Sample my speed at most once per day”“Don’t measure my speed for the next hour”“Never sample close to my home”“Wait at least 10 minutes between samples”
Can accommodate preferences using constraint optimization:
B* = argmaxB F(B) subject to C(B) · L
Can still get near-optimal solutions (details in paper)
Complex cost function
SensingBudget
16
Community Sensing SummaryOptimize value of probing roving sensors
Utility (expected error reduction)Demand (usage: “utilitarian” impact)
Sensor availabilityPredict location based on history
PreferencesAbide by preferencesE.g., frequency / number of probes, min. inter-probe intervalOther constraints: e.g., “Not near my home!”
Phenomenon
DemandAvailability
& Preferences
17
Phenomenon modeling3 months of data from 534 segments across 7 highways and interstates near Seattle, WASamples at 15 minute intervalsUse Gaussian Process to model road speeds (covariance function based on road network topology)Can compute utility R(A) in closed form!
18
Demand modelingDemand = #cars on road segmentEstimate demand based on 3166 ClearFlow route requests
Expected demand(rush hour)
19
Evaluating model accuracy
Accurate estimation of prediction error!
0 10 20 30 40 500
0.1
0.2
0.3
0.4
predicted error
0 10 20 30 40 500
0.1
0.2
0.3
0.4
test set errorpredicted error
Number of locations
Dem
and-
wei
ghte
d RM
S
Low
er
is b
ett
er
20
0 10 20 30 40 500
0.05
0.1
0.15
0.2
0.25
Number of observations
Dem
and-
wei
ghte
d va
rianc
e Random
Demand driven querying
65% error reduction using only 10 (of 534) observations!Optimized sensing requires 10x fewer samples!
0 10 20 30 40 500
0.05
0.1
0.15
0.2
0.25
Number of observations
Dem
and-
wei
ghte
d va
rianc
e
Unit-weights
Random
0 10 20 30 40 500
0.05
0.1
0.15
0.2
0.25
Number of observations
Dem
and-
wei
ghte
d va
rianc
e
Demand-weights
Unit-weights
Random
Low
er
is b
ett
er
21
Availability modelingMicrosoft Multiperson Location Survey (MSMLS) [Krumm ‘06]GPS traces from 85 drivers, 6+ days eachAssociate GPS readings with road segments“Map matching”
Two models of sensor availabilitySpatial obfuscationSparse querying
GPS usedin MSMLS
22
Spatial obfuscationMotivation: Privacy through enforcing uncertainty about sensor location
CommunitySensing Service
Populationof sensors
Request road speed at some location in area X
Anonymized response fromrandom car in cell X (if available)
X
23
0 10 20 30 40 50
0.1
0.15
0.2
0.25
Number of observations
Dem
and-
wei
ghte
d va
rianc
e
Random road
Optimize road
0 10 20 30 40 50
0.1
0.15
0.2
0.25
Number of observations
Dem
and-
wei
ghte
d va
rianc
e
Random road
13 cells
Optimize road
0 10 20 30 40 50
0.1
0.15
0.2
0.25
Number of observations
Dem
and-
wei
ghte
d va
rianc
e
Random road
13 cells
53 cells
Optimize road
0 10 20 30 40 50
0.1
0.15
0.2
0.25
Number of observations
Dem
and-
wei
ghte
d va
rianc
e
Random road
13 cells
53 cells
146 cells
Optimize road
Spatial obfuscation
Discretization ≈ Utility / Privacy knobHigh accuracy even with coarse discretization
23
0 10 20 30 40 50
0.1
0.15
0.2
0.25
Number of observations
Dem
and-
wei
ghte
d va
rianc
e
Random road
13 cells
53 cells
146 cells
449 cells
Optimize road
Low
er
is b
ett
er
24
Obfuscation by sparse queryingAssociate roving sensors with anonymous IDLearn availability model for each sensor from data
CommunitySensing Service
Populationof sensors
Request road speed and location from car Ci
Response from car Ci
(if connected to network available)
25
0 10 20 30 40 50
0.1
0.15
0.2
0.25
Number of observations
Dem
and-
wei
ghte
d va
rianc
e
Random road
Optimized road
0 10 20 30 40 50
0.1
0.15
0.2
0.25
Number of observations
Dem
and-
wei
ghte
d va
rianc
e
Random road
Optimized road
Random user
Obfuscation by sparse monitoring
Biggest difference in “important” part of the curve50% error reduction over mean if querying 10 “cars”
25
0 10 20 30 40 50
0.1
0.15
0.2
0.25
Number of observations
Dem
and-
wei
ghte
d va
rianc
e
Random road
Optimized road
Optimized user
Random user
Low
er
is b
ett
er
26
Mobile vs. fixed sensorsWhen does it “pay off” to use mobile vs. fixed sensors?Experiment: cost C(B) = #fixed(B) + #mobile(B)
Mobile sensors pay off if fixed sensors 4x as expensive
Fixedbudget
max F(B) s.t. C(B)· L
27
Extensions / Future workSpatio-temporal models (see paper)How to quickly learn good models (see paper)
Other applications:Population fitness?News coverage?Reconstruction of 3D cities?
Formal privacy guarantees?
28
Related workTravel time estimation using cell phones [Wunnava et al ’07]Privacy-aware querying of cars with GPS & cell phones [Bayen et al ’08, forthcoming]Spatial monitoring, experimental design etc. (see paper)
29
ConclusionsPresented integrated approach to community sensing
Theoretical analysis near-optimal sensing policiesExtensive empirical evaluation on traffic monitoring case study
Phenomenon
Demand Availability& Preferences