mobeyes: smart mobs for urban monitoring with …netlab.cs.ucla.edu › wiki › files ›...

MobEyes: Smart Mobs for Urban Monitoringwith Vehicular Sensor Networks

Uichin Lee†, Eugenio Magistretti∗, Biao Zhou†, Mario Gerla†, Paolo Bellavista∗, Antonio Corradi∗†Department of Computer Science ∗Dipartimento di Elettronica, Informatica e Sistemistica

University of California University of BolognaLos Angeles, CA 90095 Bologna, Italy 40136

{uclee,zhb,gerla}@cs.ucla.edu, {emagistretti,pbellavista,acorradi}@deis.unibo.it

Abstract— Vehicular sensor networks are emerging as a newnetwork paradigm of primary relevance, especially for proac-tively gathering monitoring information in urban environments(proactive urban monitoring). Each vehicle, which typically hasno strict constraints on processing power and storage capabilities,can sense one or more events (e.g., imaging from streets anddetecting toxic chemicals), process sensed data (e.g., recognizinglicense plates) and route messages to other vehicles (e.g., diffusingrelevant notification to drivers or police agents). In this novel andchallenging mobile environment, sensors can generate a sheeramount of data, and traditional sensor network approaches fordata reporting become infeasible. This paper proposes MobEyes,an efficient lightweight support for proactive urban monitoringbased on the primary idea of exploiting vehicle mobility toopportunistically diffuse concise summaries describing senseddata. MobEyes harvests these summaries and builds a low-cost distributed index of the autonomous mobile storage ofsensed data. The reported experimental/analytic results showthat MobEyes-based indexing can achieve reasonable complete-ness with good scalability and limited overhead in completelydecentralized deployment scenarios.

I. INTRODUCTION

Wireless mobile devices such as cell phones, PDAs, andWi-Fi laptops become ubiquitous in our daily lives and guideus into the era of pervasive computing. For instance, clothesand cars equipped with such devices are going to seamlesslygive us helpful information when we are traveling a newcity or shopping. Not only do such devices enrich our dailyactivities, but also create an environment such that epidemicsof cooperation can thrive, e.g., among rescue workers or pedes-trians, and thus people can cooperate in ways that were neverpossible before. Futurist Howard Rheingold first named thesekinds of activities as “smart mobs,” where people with sharedinterests/goals can pervasively and seamlessly cooperate usingwireless mobile devices [23].

Reflecting on tragedies such as 9/11 and London Bombing,we envision that such smart mobs may actually help relievelosses or investigate accidents if they were properly organizedbeforehand. For example, in the London Bombing, the policewere able to track some of the suspects in the subway usingclosed-circuit TV cameras, but they had a hard time findinghelpful evidence from double-decker bus, in spite of theabundance of pictures taken by shutterbugs using their cameraphones. This incident convinced the British police to installmore cameras that can read license plates to track vehicles. Yet

in such a scenario, a smart mob approach is more desirablethan an extensive deployment of cameras/sensors: in fact,the vision of smart mobs that exploit completely distributedepidemic cooperation makes it hard for potential attackersto disable surveillance. Let us briefly note that people arewilling to sacrifice privacy and to accept a reasonable levelof surveillance when monitored data can be collected andprocessed only by recognized authorities, e.g., for forensicpurposes, to counteract terrorism and common crimes.

The reconstruction of a crime and, more generally, theposterior investigation of an event potentially monitored bydistributed mobile sensors, e.g., the nodes of a VehicularSensor Network (VSN) [16], require the collection, storage,and retrieval of massive amounts of sensed data. This is amajor departure from conventional sensor network operationswhere data is collected, examined, and dispatched to a “sink”under predefined conditions such as alarm thresholds. Forinstance, it is impossible to deliver all the data detected byvideo sensors to a police authority sink because of the sizeof streaming data. Moreover, sensing nodes usually cannotdetermine a priori whether their data will be of any usefor future investigations. Then, this becomes the problem ofsearching for sensed data in a massive, mobile, and completelydecentralized storage, by ensuring low intrusiveness to otherservices, good scalability up to thousands of nodes, and disrup-tion tolerance against mobility/terrorist attacks by promotingcompletely decentralized cooperation networks.

To tackle such challenging technical issues, we have de-veloped MobEyes, a system to form smart mobs for VSN-based proactive urban monitoring. MobEyes exploits wireless-enabled vehicles equipped with video cameras and a variety ofsensors to perform event sensing, processing/filtering of senseddata, and ad hoc message routing to other vehicles. Since itis impossible to directly report the sheer amount of senseddata to the authority, MobEyes proposes that: sensed data staywith monitoring mobile nodes (i.e., mobile distributed stor-age); vehicle-local processing capabilities are used to extractfeatures of interest (e.g., license plates from traffic monitoringimages); mobile nodes periodically generate data summarieswith extracted features and context information such as times-tamps and positioning coordinates; mobile agents (e.g., policepatrolling cars) move and opportunistically harvest summariesfrom neighbor vehicles. Therefore, MobEyes harvesting agents

can directly query the collected summaries to answer questionssuch as: where was a certain vehicle at a certain time?; whichvehicles were at a given time in a given place?; and which arethe features of the data sensed and stored by a given vehicle?Once identified the sensed data of interest via summaries,the agent can pump out the data by contacting the involvedvehicles.

The original protocols exploited by MobEyes nodes forsummary diffusion/harvesting exploit intrinsic vehicle mobil-ity and single-hop communications among nodes. As thor-oughly demonstrated by the reported experimental results,in common deployment scenarios, opportunistic summarydiffusion/harvesting permits to build complete indexing infeasible time, with good scalability, and with limited overhead,by maintaining a completely decentralized disruption-tolerantorganization. Note that the use of MobEyes is not confined toforensic data collection, yet it can be used in a wide spectrumof applications: for instance, vehicles can measure pollutionlevels and collect traffic-related data, such as congestion androad conditions. The MobEyes modular architecture facilitatesits exploitation in different fields, by clearly confining andlimiting the support modifications depending on the specificapplication logic.

This paper is organized as follows. In Section II, we reviewthe related work. In Section III, we present the architecture ofMobEyes. In Section IV, we detail our proposed MobEyes aswell as its mathematical analysis. In Section V, we evaluateour protocols through extensive simulations. In Section VI,we propose our preliminary security solutions for MobEyes.Finally, we conclude the paper in Section VII.

II. RELATED WORK

A. Sensor-based Urban Monitoring

A wide range of emerging urban monitoring applicationsare clear proof of growing interest in the field. Just tomention a couple of examples, Intel Research IrisNet [7]provides large-scale monitoring based on Internet-connectedPCs equipped with off-the-shelf cameras and microphones.IrisNet also supports urban monitoring through a group ofagents that collect and process raw data to answer queries rel-evant to the application: for instance, camera data is processedto track available parking in a metropolitan area. Anotherurban monitoring example is the Gunshot Detection System(GDS) [6]. GDS uses a wireless network of acoustic sensornodes, strategically deployed in an urban area to find thelocation from where the shots are fired. GDS sensor nodesprocess acoustic data to recognize the number of shots firedand the type of gun used; they exploit network connectivityto notify predetermined dispatch offices or police agents. Insummary, most urban monitoring solutions, as the two brieflydescribed above, rely on i) static sensor deployments, and ii)a predetermined communication/software infrastructure. Theseaspects make urban monitoring applications vulnerable toattacks and hardly scalable.

B. Vehicular Ad Hoc Networks

Many car manufacturers are planning to install wireless con-nectivity equipment in their vehicles to enable communicationswith roadside base stations and also between vehicles, for thepurposes of safety, driving assistance, and entertainment [4].The two primary distinct features of vehicle networks are thati) vehicles can be highly mobile, with speed up to 30 m/s,and ii) their mobility patterns are more predictable than thoseof nodes in Mobile Ad hoc Networks (MANET) due to theconstraints imposed by roads, speed limits, and commutinghabits. Therefore, these networks require specific tradeoffs andidentify a novel research area, i.e., Vehicular Ad hoc Networks(VANET).

Much of the literature on Inter-Vehicle Communications(IVCs) is navigation safety related. A good survey on recentphysical layer technologies for IVCs can be found in [18].At the network layer, the most common way to broadcastsafety messages is via reliable, robust flooding. For example,Urban Multi-hop Broadcast (UMB) [14] features a form ofredundant flood suppression scheme where the furthest nodein the broadcast direction from a sender is selected to forwardand acknowledge the packet. The scheme alleviates broadcaststorm and hidden terminal problems.

The efficiency of flooding quickly decreases with thenumber of nodes; thus, flooding must be scope-limited. Forscalable delivery, georouting has been extensively investigated.However, intermittent connectivity due to high, yet predictablemobility in VANET makes most georouting schemes relyon carry and forward: a mobile vehicle carries packets andforwards them to a new vehicle when moving into its vicinity,thus targeting delay tolerant applications [35]. In [15], amobile node uses the knowledge of the relative velocities anddirections of its neighbors to make a forwarding decision.In addition to that, MDDV [32] utilizes trajectory basedforwarding by considering the vehicle traffic. On the otherhand, VADD [34] collects information within a bounded areathrough which a vehicle can find the direction (i.e., roadselection at the intersection) to which it forwards a packetto reduce the delay. MobEyes applications are delay tolerant(e.g., urban monitoring), and thus, these techniques could beused to access actual data out of mobile nodes, yet to do so,MobEyes must provide searching over mobile nodes, which isthe focus of this paper.

C. Vehicular Sensor Networks

A Vehicular Sensor Network (VSN) can be built on top ofVANET by equipping vehicles with onboard sensing devicesas shown in Figure 1. Sensors can gather not only safety-related information, e.g., seat occupation, but also more com-plex multimedia data, e.g., audio/video streams. Compared totraditional static sensors, a sensor node is not subject to mem-ory, processing, storage, and energy limitations. Moreover, amobile sensor node improves sensing coverage [17]. Thus,VSN can be used for variety purposes: for instance, trafficengineering (e.g., traffic pattern analysis and road surface

VSN-enabled vehicle

Inter-vehiclecom m unications

V ehicle-to-roadsidecom m unications

Roadside base station

Video Chem.

Sensors

Storage

Systems

Proc.

Fig. 1. Vehicular sensor networks

diagnosis) and urban environmental monitoring (e.g., streetlevel air pollution monitoring [28]).

To provide such services, it is necessary to collect senseddata. In the past, the following models have been widelyused in wireless sensor networks in general. First, in thedata-centric model such as Directed Diffusion [12], sensornodes systematically delivers sensed data to sinks. Second,in the data MULE model [25], proposed for sparse, staticsensor networks, mobile data collectors pick up data fromsensors in close range, buffer them, and drop off them towired access points. Third, in the data-centric storage model,or a Geographic Hash Table (GHT) [22], a sensor node hashesmeta-data into geographic coordinates and stores a key-valuepair at the sensor node geographically nearest to the hashof its key. Finally, in the Shared Wireless Infostation Model(SWIM) [27], proposed for sparse, mobile sensor networkswith fixed Infostations, sensed data is epidemically dissemi-nated through mobile nodes (replicating and diffusing), whichin turn, is offloaded to nearby Infostations. However, theseapproaches are less efficient in VSN because of i) the typicalscale of a VSN over wide geographic areas (e.g., millions ofnodes), ii) the volume of generated data (e.g., streaming data),and iii) vehicle mobility.

D. Opportunistic Data Dissemination

Opportunistic data dissemination1 exploits node mobility tobroadcast the epidemic, or data, to a complete well definedpopulation in sparse ad hoc networks [13], [30]. In Opportunis-tic Report Dissemination (ORD) [26], a mobile node spreadsits reports in the local storage to encountered nodes and obtainsnew reports in exchange. For each type of a resource, a nodekeeps the top-k relevant reports based on the exponential decayrelevance function in terms of time and distance by exchangingtheir local databases. Similarly, in Autonomous Gossiping(AG) [2], a node can set his profile by setting resourcecategories of interest and also starts with random number of

1Note that we differentiate between opportunistic data dissemination andgossiping [11] in wireless ad hoc networks. While gossiping controls theprobability of broadcasting because of the broadcasting nature of the radiotransmission, opportunistic data dissemination ensures that messages areeventually disseminated at some point in time even with sparse/intermittentconnectivity among nodes.

MSI (Sensor Interface)

DSRC Compliant Driver

MDP (Data Processing)

J2SE

JMF API Java Comm. API

Java Loc. API

GPSRadio Transceiver

Bio/ChemSensors

A/VSensors

MDHP (Diffusion/Harvesting)

SummaryDatabase

Raw DataStorage

Fig. 2. A high level architecture of a sensor node

data items by setting his interest such as categories, location,and utility; therefore, information is selectively disseminated.Unlike previous schemes, MAID [1] and MPR [19] restrictdissemination such that only a source can disseminate datato its local neighbors based on the observation that mobilityincreases the throughput at the price of increased delay [9].

To the best of our knowledge, MobEyes is original inproposing mobility-assist opportunistic diffusion of summaries(and not of sensed data) for urban monitoring, thus enablingdynamic, delay-tolerant queries on the mobile distributedstorage. In addition, unlike previous work, MobEyes exploitsthe mobility of not only regular nodes but also sink ones, thusenabling collaborative harvesting by mobile sinks.

III. MOBEYES ARCHITECTURE

For the sake of clarity, let us present the MobEyes solu-tion using one of its possible practical application scenarios:i.e., collecting information from MobEyes-enabled vehiclesabout criminals in flight on a car after spreading poisonouschemicals. Here, we assume that vehicles are equipped withcameras and chemical detection sensors. Vehicles continuouslygenerate a huge amount of sensed data, store it locally, andperiodically produce short summaries obtained by processingsensed data, e.g., license plate numbers or aggregated chemicalreadings. Summaries are opportunistically disseminated toneighbor vehicles, thus making it easier for the police toharvest such meta-data and to create a distributed index ofmeta-data, useful for forensic purposes such as crime scenereconstruction and criminal tracking.

To support all the above tasks, we have developed MobEyesaccording to the component-based architecture depicted inFigure 2. The key MobEyes component is the MobEyesDiffusion/Harvesting Processor (MDHP) facility which will bediscussed in detail in the next section. MDHP works by op-portunistically disseminating/harvesting summaries producedby MobEyes Data Processor (MDP) that accesses sensor datavia a uniform MobEyes Sensor Interface (MSI). Since vehiclesare not strictly resource-constrained, our MobEyes prototypeis built on top of the Java Standard Edition (J2SE) virtualmachine.

MDP is in charge of reading raw sensed data (via MSI),processing it, and generating summaries. Summaries includemeta-data (sensor location, timestamp, and possible additionalcontext such as simultaneous sensor alerts) and features ofinterest extracted by local filters (See Figure 3). For instance,in our scenario, MDP includes a filter determining license plate

numbers from multimedia flows taken by cameras [3]. Finally,MDP commands the storage of both raw data and summariesin two local databases.

UIDMsg.Type

HEADER

SEQ # X-loc Y-loc

SUMMARY 1

TimeStamp X-loc Y-loc Summary Payload

SUMMARY N

TimeStamp X-loc Y-loc Summary Payload

TimeStamp

Fig. 3. Packet format: a single packet contains multiple summaries dependingon the types of summaries.

The summary generation rate and the summary size arerelevant to the performance of MobEyes because MDHPdisseminates/harvests summaries by packing a set of sum-maries into a single packet for the sake of effectiveness.Developers of MobEyes-based applications can specify thedesired generation rate as a function of vehicle speed andexpected vehicle density. The summary size mainly dependson application-specific requirements: in the considered sce-nario, each recognized license plate is represented with 6B,sensed data with 10B (e.g., concentrations of potential toxicagents), timestamp with 2B, and vehicle location with 5B.Then, in our scenario, MDP can pack 65 summaries in a single1500B message, even without exploiting any data aggregationor encoding technique; in typical deployment environmentssummaries are generated every [2-10] seconds and, thus, asingle message can include all the summaries about a [2-10]minute interval.

MSI permits MDHP to access raw sensed data indepen-dently of actual sensor implementation, thus simplifying theintegration with many different types of sensors. MSI currentlyimplements methods to access, respectively, camera streamingoutputs, serial port I/O streams, and GPS information, byonly specifying a high-level name for the target sensor. Tointerface with sensor implementations, MSI exploits well-known standard specifications: Java Media Framework (JMF)API, Sun Communication API, and the JSR179 Location APIto achieve high portability and openness.

IV. MDHP PROTOCOLS

The section first details our original summary diffusionprotocol where private vehicles (regular nodes) opportunisti-cally and autonomously spread summaries of sensed data byexploiting their mobility; then, it describes our novel summaryharvesting protocol used by police agents (authority nodes) toproactively build a low-cost distributed index of the mobilestorage of sensed data. The main goal of the MDHP process isthe creation of a highly distributed, scalable index that allowslaw enforcement agents to place queries to the huge databasewithout ever trying to combine this index in a centralizedlocation. Finally, we present the simple mathematical analysisof MDHP protocols in terms of delay and scalability.

A. Summary Diffusion

Any regular node periodically advertises a new packetwith generated summaries to its current neighbors. This ad-vertisement to neighbors provides more opportunities to theagents to harvest the summaries. Clearly, excessive advertisingwill introduce too much overhead. No advertising at all willrequire considerable more time, as agents will need to contacteach individual car! We will explore the optimal tradeoff. Apacket header includes a packet type, generater ID, locallyunique sequence number, packet generation timestamp, andgenerator’s current position. Each packet is uniquely identifiedby a generator ID and its sequence number pair and containsa set of summaries locally generated during the fixed timeinterval (See Figure 3).2

Neighbor nodes receiving the packet store it in their localsummary databases. Therefore, depending on the mobility andthe encounters of regular nodes, packets are opportunisticallydiffused into the network (passive diffusion). MobEyes canbe configured to perform either single-hop passive diffusion(only the source advertises its packet to current single-hopneighbors) or k-hop passive diffusion (the packet travels upto k-hop as it is forwarded by j-hop neighbors with j < k).Other diffusion strategies could be included in MobEyes, forinstance single-hop active diffusion where any node period-ically advertises all packets (generated and received) in itslocal summary databases, at the expense of a greater trafficoverhead. As detailed in the experimental evaluation section,in a usual urban vehicular network (node mobility restricted byroads), it is sufficient for MobEyes to exploit the lightweightk-hop passive diffusion strategy – with very small k values –to achieve the desired diffusion levels.

TT

T-t6

T-t5T-t4

T-t3

T-t2

T-t1

Trajectory

C1C2

Advertise Advertise

Encounter Point

Time Sum.0

T-t4SC1,1

SC2,1

Time Sum.0

T-t4SC2,1

SC1,1

Fig. 4. MobEyes single-hop passive diffusion

Figure 4 depicts the case of two sensor nodes, C1 and C2,that encounter with other sensor nodes while moving (theradio range is represented as a dotted circle). For ease ofexplanation, we assume that there is only a single encounter,but in reality any nodes within dotted circle are consideredencounters. In the figure, a black triangle with timestamprepresents an encounter. According to the MobEyes summarydiffusion protocol, C1 and C2 periodically advertise a new

2The packet advertisement interval can be determined from the harvestingtime distribution with average (μ) and standard deviation (ρ). Then, Cheby-shev inequality, P (|x − μ| ≤ kρ) ≥ 1

k2 allows us to choose k such that weguarantee harvesting latency and thus we can determine the period as μ+kρ.Readers can find details in Section IV-C.2

summary packet SC1,1 and SC2,1 respectively where thesubscript denotes 〈ID, Seq.#〉. At time T−t4, C2 encountersC1, and thus they exchange those packets. As a result, C1carries SC2,1 and C2 carries SC1,1.

Summary diffusion is time and location sensitive (i.e.,spatial-temporal information diffusion). When the packet ex-pires3 or the packet originator moves away more than thethreshold distance from the sensing location4, the packet isautomatically disposed. The expiration time and the maximumdistance are system parameters determined by the require-ments of an application. Let us also briefly note that sum-maries always include current location information of nodes.Upon receiving an advertisement, neighbor nodes keeps theencounter information (the advertiser’s current position andcurrent timestamp). This allows MobEyes nodes to exploitspatial-temporal routing techniques [10] and a geo-referenceservice when accessing actual raw data. That is obtainedas a simple byproduct of summary dissemination, withoutadditional costs.

B. Summary Harvesting

In parallel with diffusion, harvesting takes place. AMobEyes police agent node can collect diffused summariesfrom regular nodes by proactively querying its neighbors(proactive summary harvesting). The goal is to collect allthe summaries generated in a given area. Obviously, a policenode is interested in harvesting summary packets it has notcollected so far: to focus only on missing packets, a MobEyesauthority node compares its list of summary packets with thatof each neighbor (set difference problem), by exploiting aspace-efficient data structure for membership checking, i.e.,a Bloom filter. A Bloom filter for representing a set of nelements, S = {s1, s2, · · · , sn}, consists of m bits, whichare initially set to 0. The filter uses k independent randomhash functions h1, · · · , hk within m bits; by applying thesehash functions, the filter records the presence of each elementinto the m bits by setting k corresponding bits; to check themembership of the element x, it is sufficient to verify whetherall hi(x) are set to 1. It is important to note that a packetbecomes invalid when it expires or the packet originator movesaway more than the threshold distance. To deal with this, eachbit of a Bloom filter has a corresponding counter so that aninvalid summary packet can be deleted by decrementing acounter.

A MobEyes police agent uses a Bloom filter to representits set of already harvested and still valid summary packets.Since each summary has a unique node ID and sequencenumber pair, we use this as an input of the hash functions.The MobEyes harvesting procedure consists of the followingsteps:

3Regular nodes can keep track of freshness of summary packets by usinga sliding window with the maximum window size of fixed expiration time.

4Since a single summary packet contains multiple summaries, the averageposition of all summaries could be used as the center of the packet.

1) The agent broadcasts a “harvest” request message withits Bloom filter.5

2) Each neighbor prepares a list of “missing” packets fromthe received Bloom filter.

3) One of the neighbors returns missing packets to the agent.4) The agent sends back an acknowledgment with a pig-

gybacked list of returned packets and, upon listening oroverhearing this, neighbors update their lists of missingpackets.

5) Steps 3 and 4 are repeated until there is no remainingpacket.

An example of summary harvesting is shown in Figure 5.The agent first broadcasts its Bloom filter related to collectedpackets so far (P2, P4, P6, P7, P9 and P10) as in Figure5(a). Each neighbor receives the filter and creates a list ofmissing packets. For example, C3 has P3 and P8 to return,while C4 has P1 and P8. In Figure 5(b), C2 is the first nodeto return missing packets (P1, P3) and the agent sends backan acknowledgement piggybacked with the list of receivedpackets. Neighbor nodes overhear the message and updatetheir lists: C3 and C4 both remove P1 from their lists, asdepicted in Figure 5(c).

Note that membership checking in a Bloom filter is prob-abilistic and false positives are possible. If this happens, aregular node fails to send packets still missing at the agent. Afalse positive happens when an element not in S, but its bitshi(s) are collectively marked by elements in S. If the hash isuniformly random over the m bits, the probability that a bit is0 after all the n elements are hashed and their bits marked is(1 − 1

m)kn � e−

knm . Thus, the probability of a false positive6

is given as pf = (1 − (1 − 1m

)kn)k � (1 − e−knm )k.

Given that an agent has α neighbors on average, we areinterested in calculating the probability that all α neighborsfail to deliver a specific packet. To deliver a packet, a nodemust have the packet and the false positive should not happen.The probability that a node has a random summary packetat time t is Et

Nwhere Et is the number of packets that a

node has collected till time t. The probability for a node tosuccessfully send the packet to the agent is (1−pf )Et

N. Then the

probability that the agent fails to receive the packet from anyof α neighbors is given (1− (1−pf )Et

N)α. Assuming that each

time step is independent of each other, after repeating β timesteps, the agent fails to receive the packet with probabilityQt+β

k=t

“1 − (1 − pf )Ek

N

”α

. Therefore, the probability that theagent receives the packet is given as:

1 −t+βYk=t

„1 − (1 − pf )

Ek

N

«α

For example, given that we use 4 bits to represent asummary (i.e., m/n = 4), the chance of an error is roughly

5Note that we do not send the counters.6In the equilibrium state, we can assume that the agent has always n

summaries since the rate of new summaries and the rate of expired summariesare the same. Thus, the probability of a false positive is constant, i.e., pf .

P2 P4 P6P7 P9 P10

C1

C2

C3

C4

(a) Broadcast a harvest request

C1

C2

C3

C4

P5 P8 P3 P8

P1 P3

P1P3

P1 P8

(b) C2 first returns missing packets

C1

C2

C3

C4

ACK: P1 P3

P5 P8 P8

P8

(c) Broadcast acknowledgement

Fig. 5. MobEyes proactive summary harvesting

15%.7 Let the average number of neighbors be α = 10.Assuming that a node has collected half of the summaries,the probability that the agent fails to receive a specific packetfrom neighbors is (1−0.85×0.5)10 = 0.004. If we repeat thisprocess 5 times, then the probability that the agent can collectthe packet is roughly given as 1 − 0.0045 � 1. In summary,since it is highly probable that other nodes have the packetsas time passes, and the harvesting procedure is repeated as theagent moves, the agent can obtain a missing packet with highprobability.

For the sake of simplicity, thus far we assumed that thereis a single agent harvesting summaries. Actually, MobEyescan handle concurrent harvesting by multiple agents that cancooperate by exchanging their Bloom filters among multiple-hop routing paths; thus, this creates a distributed, partiallyreplicated index. In particular, whenever an agent harvestsa set of j new summary packets, it broadcasts its Bloomfilter to other agents, with the benefits in terms of latencyand accuracy shown in Section V-B. Note that strategicallycontrolling the trajectory of police agents, properly schedulingBloom filter updates, and efficiently accessing that partitionedand partially replicated distributed index are part of our futurework. Note also that the cooperation of multiple agents –say in the thousands – is necessary to handle large urbanareas (millions of cars with billions of summaries generatedin a singles day). The management of this distributed index(over all police cars) is the subject of ongoing research. Inthe following section we are limiting ourselves to describingthe tradeoffs between dissemination and harvesting in a singlegeographic area, and the dependence of performance fromvarious parameters. We also analyze the traffic overheadcreated by diffusion/harvesting and show that it can scale wellto large number of nodes.

C. Analysis of MDHP Protocols

The efficiency of MDHP protocols can be evaluated bysummary harvesting delay and scalability. How fast the agentcan harvest is a critical issue as well as how scalable theprotocols are.

1) Summary Harvesting Delay: We assume that there areN nodes in the network and each node advertises a singlesummary packet (total N summary packets). We basically

7Given the ratio mn

, the optimal k is given as �(ln 2)mn�. Given m

n= 4,

the optimal value is k = �(ln 2)4� = 3. More details and other applicationscan be found in [5].

assume that nodes are uniformly distributed within a squarearea with length L. The node density is given as ρ = N/L2.But when we consider non-uniform node distribution underdifferent mobility models, the node density is simply given asρ = δN/L2 where δ is the constant compensation factor fora given mobility model. We assume that nodes are randomlymoving at a speed of v on average. Let v∗ denote the averagerelative speed of nodes. Let R denote the communicationrange of a node. Now we are ready to develop a deterministic,discrete time model.

Let us first calculate how many summary packets a regularnode can passively collect over time, which allows us to esti-mate the time for the agent to harvest all the summaries. Let usassume that all nodes are static except the passive harvestingnode. The passive harvester randomly moves the network toharvest summaries by passively listening to advertisements ofencountered nodes. In this case, the passive harvester acts thesame as data mule in traditional sensor networks. During thetime slot Δt, the harvester travels distance r = vΔt and coversan area of vΔt × 2R. The expected number of encounterednodes in this area is simply α = ρvΔt × 2R. Each of thesenodes will advertise their summaries, and the harvester willreceive α summaries at each time slot. This movement patternis in fact the dual of the case where all nodes are mobilebut the passive harvesting node. Moreover, without loss ofgenerality, if all nodes are mobile, we can simply replace theaverage speed with the average relative speed. Hereafter weuse α = ρv∗Δt × 2R unless otherwise specified.

Given that the passive harvester randomly moves, it willcollect a new summary with probability 1 − Et/N where Et

denotes the number of distinct summaries collected by timeslot t. It is obvious that non-uniform movement patterns willaffect the probability. Since we are interested in the averagebehavior, we can remedy this problem by simply multiplyingthe constant compensation factor η to the probability.8 There-fore, we have the following relationship.

Et − Et−1 = αη

„1 − Et−1

N

«(1)

Equation 1 is a standard difference equation and the solutionis

Et = N − (N − αη)“1 − αη

N

”t

(2)

8For example, if nodes are traveling along a road, the probability will bemuch smaller than random mobility.

Equation 2 tells us that the distinct number of collectedsummaries is exponentially increasing and thus, as time tendsinfinity, Et = N . Let T denote a random variable of the arrivaltime of a new summary. By dividing Equation 2 by N , we havea cumulative distribution FT (t) as follows

FT (t) = 1 −“1 − αη

N

”t+1

(3)

From this, we can derive PMF fT (t) as follows

fT (t) =αη

N

“1 − αη

N

”t

(4)

Equation 4 is a modified geometric distribution with successprobability p = αη

N , and thus the average is

E[T ] =1

p− 1 =

N

ρ × vΔt × 2R− 1 =

L2

vΔt × 2R− 1 (5)

We realize that given a square area with length L, the averagetime for a regular node to collect a summary is independentof node density α, but it is a function of average speed andcommunication range. Intuitively, as node density increases(i.e., N increases), a node can collect more summaries for agiven time slot, but this also means that it has to collect anincreased number of summaries.

Unlike regular nodes, the agent actively harvests summariesfrom its neighbors (active harvesting). At time slot t, each ofits neighbors has collected Et summaries so far, assuming thatall regular nodes have started their advertisements at time slot0. Since the probability that a neighbor node does not have arandom summary is given as η(1 − Et

N), the probability that

none of α neighbors has a summary is simply (η − η EtN

)α.The probability that at least one of neighbors has a randomsummary is 1 − (η − ηEt

N)α. The expected number of distinct

summaries from its neighbors at time step t can be expressedby simply multiplying N :

N

„1 −

“η − ηEt−1

N

”α«

(6)

Let E∗t denote the expected number of distinct summaries

harvested by the agent till time step t. From Equation 6, thenumber of new summaries harvested during at the time step tcan be expressed as follows.

E∗t − E∗

t−1 = γN

„1 −

“η − ηEt−1

N

”α« „

1 − E∗t−1

N

«(7)

where the constant compensation factor γ adjust the jointprobability to consider non-uniform mobility. Note that asdescribed before non-uniform mobility reduces the rate of newencounters, thus decreasing the probability of passive harvest-ing (i.e., adjusted by η). Such mobility also exerts balefulinfluence on the rate of active harvesting since neighbors tendto carry overlapping summaries (i.e., adjusted by γ).

From Equation 7, we see that E∗t grows much faster than

Et. During a time slot, the number of collected summaries inEquation 2 is constant (i.e., α), whereas that in Equation 7 isa function of time (i.e., Equation 6). Moreover, Equation 6 is

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 500 1000 1500 2000Fraction of harvested summaries

Time (seconds)

Agent (k=2)Agent (k=1)

Regular Node (k=2)Regular Node (k=1)

Fig. 6. Fraction of harvested summaries with k = 1, 2

a function of the number of neighbors, i.e., related to nodedensity. As N , or node density, increases, we can see that theharvesting delay also decreases. It is also interesting to notethat since the duration of a time slot is inversely proportionalto the average speed (Δt = r/v), we see that the harvestingdelay has the same relationship.

The growing rate of Et depends on mobility models. Theabove equations are based on the random motion model, butfor restricted mobility models such as the Manhattan model,the rate will be smaller than for the others. In this case, weuse k-hop relay scope where a summary is flooded up to khops. As stated before, we are assuming a rectangular area,Δt × 2R. Thus, increasing k-hop relay scope is the same asmultiplying the area by k times.

Ekt − Ek

t−1 = kαη

„1 − Ek

t−1

N

«(8)

This tells us that even though Et grows rather slowly due to amobility model, by increasing the hop count we can increasethe rate of Et. Similarly, the agent harvests summaries asfollows

Ek∗t − Ek∗

t−1 = γN

„1 −

“η − ηEk

t−1

N

”kα« „

1 − Ek∗t−1

N

«(9)

For illustration, let us assume that we have total N = 200nodes within an area of 2400m × 2400m. The transmissionrange is R = 250m, and nodes’ relative speed is 10m/s onaverage. For system parameters, we used η = 1, γ = 0.2 andΔt = 1s. The iterative solutions of both Et and E∗

t are shownin Figure 6. The figure shows that the agent can harvest all thesummaries much faster than a regular node. The figure alsoshows that k-hop relay scope improves the overall delay.

2) Scalability: The feasibility of MobEyes strictly dependsalso on its scalability over wide VSN, in terms of both thenetwork traffic due to passive diffusion when the number ofregular nodes grows and the number of regular nodes that asingle harvesting agent can handle with a reasonable latency.

About passive diffusion network traffic, it is possible toanalytically estimate the MobEyes radio channel utilization.In the diffusion process, nodes periodically advertise theirpackets, without any synchronization among them. Therefore,

we can model the process by considering a packet randomlysent within a time slot of [kTa, (k +1)Ta) for all k, where Ta

is the generation period. So, the number of packets receivedby a node is bounded by the number of its neighbors whileit is traveling for Ta, thus depending on node density, but noton the overall number of nodes. In contrast, any “flooding”-based diffusion protocol is not scalable because a node couldpotentially receive a number of packets proportional to thenetwork size. To give a rough idea of the traffic generated byMobEyes diffusion, let us simply use Ta = 2R/v∗ (i.e., thetime for a mobile node travels the communication diameter)where R is the transmission range and v∗ denotes the relativespeed of two nodes.9 In our target deployment environment,we have v∗ = 20m/s, R = 250m, the advertisement periodTa = 12.5s, and the fixed packet size S = 1500B; andconsequently, the transmission time for one packet is about1ms. While traveling for Ta, a regular node will be exposed toadvertisements from an area of A = πR2 +4R2. In the worstcase, all nodes within this area are distinct and potentiallysend their generated packets to the considered node (potentialsenders n = Aρ). Therefore, the worst case link utilizationcould be estimated as nTx/Ta: for instance, given a relativelyhigh populated area with N = 2, 000, the number of potentialsenders is n � 179, and the MobEyes protocol has a worstcase utilization of 0.014.

Similarly, we can give an approximated idea of the scala-bility of the harvesting process via a simple queuing model.Consider the usual situation of a police agent that harvests onlyfresh summaries, i.e., generated in the last Texp seconds. Let usassume that the summary arrival rate is Poisson with rate λ =Nλ′ and the harvesting rate is deterministic with rate μ. Giventhat the harvesting rate is limited by the channel utilization α,the maximum μ is simply αTexp

Tx, where Tx is the transmission

time of a single packet. As a result, the system can be modeledusing an M/D/1 queue. The stability condition, Nλ′ <

αTexp

Tx,

gives us the upper bound N <αTexp

λ′Tx. Therefore, it is possible

to conclude that for a given Texp and arrival rate, there existsa limit in the number of regular nodes that a single harvestingagent can handle. For example, in the considered scenario(α = 0.01, λ′ = 1, and Texp = 500 seconds), that numberis N < 0.01×500

1×0.001 = 5, 000. As a consequence, in the case ofnode numbers equal or not far from 5,000, there is the need todeploy more than one harvesting agent to maintain the systemstable.

V. MOBEYES PERFORMANCE EVALUATION

We have thoroughly evaluated the MobEyes performanceand validated its protocols via extensive simulations usingNs-2 [21]. For the sake of briefness, in the following the

9For a given speed, we want to make sure that the interval is neither tooshort nor too long compared to the average connection duration among nodes.If it is too short, then we are unnecessary sending out more packets to the sameset of nodes, thus increasing link bandwidth utilization. On the other hand,if it is too long, a node misses chances to send the packets to encounterednodes, thus slowing down dissemination. To this reason, we simply set thevalue Ta = 2R

V ∗ as the time for a mobile node travels the communicationdiameter (2R).

Fig. 7. Map of Westwood area in vicinity of UCLA campus

paper presents the most relevant performance figures aboutlatencies of summary diffusion and harvesting (varying boththe diffusion range k and the number of harvesting agents)and about traffic overhead.

A. Simulation Setup

The following simulation results are obtained by consideringvehicle nodes with IEEE 802.11 connectivity, 11Mbps band-width, nominal radio ranges from 100m to 300m, and two-ray ground reflection as the radio propagation model. Nodesmove according to the Real-Track (RT) mobility model [36]or the Random WayPoint (RWP) model. RWP is consideredquite similar to the actual vehicular mobility because of itssettings such as arrivals, directions and speeds [24]. In reality,however, a complex mobility behavior is observed: vehiclestend to move as a group because of traffic signals and switchdirections only at road intersections. RT permits to modelsuch vehicle mobility patterns in an urban environment morerealistically than RWP. For comparison, we present results ofboth mobility models. Our simulations consider a vehicularnetwork with a number of nodes between 100 and 300 inincrements of 100 nodes; modeled vehicles move with anaverage speed between 5 and 25 m/s in increments of 5 m/s.In the case of RT, we use a real map of the 2, 400m×2, 400mWestwood area in the vicinity of the UCLA campus, obtainedby the US Census Bureau data for street-level maps as shownFigure 7.

In the simulations, a regular node periodically (3 secondperiod as a baseline) advertised a single summary packet.10

In addition, an agent broadcast the harvesting request every3 seconds. For a given mobility model, we generated threescenario files of each configuration which is defined by thenumber of nodes and their average speed. For each run,an agent is randomly selected and we measured the valuesof interest such as the latencies of summary diffusion andharvesting. To get the average value, we run each of threescenario files 30 times unless otherwise mentioned.

10Thus, the total number of summary packets during a simulation is thesame as the total number of nodes.

0

10

20

30

40

50

60

70

80

90

100

0 200 400 600 800 1000

Number of harvested summaries

Time (seconds)

N=100/V=25 SimulationN=100/V=25 AnalysisN=100/V=5 SimulationN=100/V=5 Analysis

(a) RWP/N=100

0

50

100

150

200

250

300

0 200 400 600 800 1000


Time (seconds)

N=300/V=25 SimulationN=300/V=25 Analysis


(b) RWP/N=300

0

10

20

30

40

50

60

70

80

90

100

0 500 1000 1500 2000


Time (seconds)

N=100/V=25 SimulationN=100/V=25 AnalysisN=100/V=5 SimulationN=100/V=5 Analysis

(c) RT/N=100

0

50

100

150

200

250

300

0 500 1000 1500 2000Number of harvested summaries

Time (seconds)



(d) RT/N=300

Fig. 8. Faction of harvested summaries

B. Summary Diffusion and Harvesting Processes

We define summary harvesting latency as the time for anagent to harvest all summaries generated by regular nodesfrom the instant harvesting started (finishing time of harvestingall summaries). This performance figure is crucial to determinethe feasibility of the MobEyes approach in terms of time tocomplete the process. We first show the number of collectedsummaries of the agent (as a result of active harvesting)through cumulative distributions of harvesting. To see howthe number of nodes and the average speed impact on thecumulative distribution, we used the following parameters:N = 100/300 and v = 5/25. The results are shown inFigure 8. We plotted results of both RWP and RT as wellas the analytical results.11 The harvesting distribution allowsus to estimate the fraction of harvested summaries by a certaintime t. This estimation is useful assuming that the agent hasstrict time constraints on the maximum time interval devotedto harvesting. For instance, a harvesting expiration time of500 seconds is sufficient for a single agent to collect 98% ofsummaries (N = 300, V = 25, RT ).

The figure shows that the latency is closely related to thenumber of nodes and the average speed in both mobilitymodels: namely, the latency decreases as the number of

11In the case of a regular node, for each run, we can get the averagecumulative distribution from all nodes; thus, it is actually the same as gettingthe average by running N×3×30 times whereas that of the agent is generatedby running only 3× 30 times. Therefore, the plots of a regular node is muchsmoother than that of the agent.

nodes or the average speed increases. We also notice that therestricted mobility in RT makes the latency decrement smallerthan that of RWP when we increase the number of nodes oraverage speed. Let us see the relative latency decrement fromthe baseline case (N = 300, V = 5, 95% harvesting) whenthe average speed increases to V = 25. In RWP, the latencyhas decreased from 404s to 82s (4.9 times) where as in RT, ithas decreased from 948s to 392s (2.4 times).

Our analytical results fits well with both RWP and RT. It isinteresting to see that RT also shows a good fit to the measureddata. To better understand, we plotted the inverse cumulativedistribution of passive harvesting (by a regular node) for bothRWP and RT in Figure 9. Note that Y-axis tick 0.1 and 0.01mean 90% and 99% of collection respectively. Both RWPand RT fits well with the analytical results, and in particular,RWP fits better than RT. This is due to the restricted mobilitypatterns of RT. After 90% of collection, the tail of RT startsto departing from the analytic line. It is hard for regular nodesto collect the rest 10% in RT. Note that RWP does not exhibitthis problem since nodes tend to flock together at the centerof the simulated area. In MobEyes, agent nodes mitigate thisproblem by proactively harvesting summaries from regularnodes, and this explains why our analytical results fit wellas shown in Figure 8. Note that η values used for RWP were0.78 and 0.46, and η values used for RT were 0.22 and 0.138for 5 m/s and 25 m/s respectively.12 We notice that restricted

12For the curve fitting, RWP’s γ values were 0.040 and 0.034, and RT’s γvalues were 0.011 and 0.017 for 5 m/s and 25 m/s respectively

0.001

0.01

0.1

1

0 1000 2000 3000 4000 5000 6000

Fraction of harvested summaries (log)

Time (seconds)

RWP N=200 V=15RWP N=300 V=15V=15 AnalyticRWP N=200 V=25RWP N=300 V=25V=25 Analytic

(a) RWP

0.001

0.01

0.1

1

0 1000 2000 3000 4000 5000 6000

Fraction of harvested summaries (log)

Time (seconds)

RT N=200 V=15RT N=300 V=15V=15 AnalyticRT N=200 V=25RT N=300 V=25V=25 Analytic

(b) RT

Fig. 9. Faction of passively harvested summaries by a regular node (inverse cumulative distribution)

100

200

300

400

500

600

700

800

900

1000

1100

5 10 15 20 25

Latency (seconds)

Average speed (m/s)

N=100N=200N=300

(a) RWP

0

500

1000

1500

2000

2500

3000

5 10 15 20 25

Latency (seconds)

Average speed (m/s)

N=100N=200N=300

(b) RT

Fig. 10. Average summary harvesting latency

20

40

60

80

100

120

140

160

180

200

220

5 10 15 20 25

Latency (seconds)

Average speed (m/s)

N=100N=200N=300

(a) RWP

100

200

300

400

500

600

700

800

5 10 15 20 25

Latency (seconds)

Average speed (m/s)

N=100N=200N=300

(b) RT

Fig. 11. Average diffusion latency

mobility results in smaller η values. Moreover, although speedincreases 5 times, η decreases less than half and thus, speedincrement has great impact on the harvesting latency. Notealso that Figure 9 shows that for a given average speed thecumulative distributions of passive harvesting with N = 200and N = 300 are almost identical. This confirms our analyticalresult that the distribution is independent of node density.

We then investigated the average summary harvesting anddiffusion latency. The summary harvesting latency is thetime to harvest all summaries, and this can be measured byrecording the finishing time of harvesting by the agent foreach run. Figure 10 shows this as a function of node density

and average speed for both mobility models (RWP and RT).As expected, when increasing the number of nodes or averagespeed, we observe that the latency decreases. In particular, thelatency noticeably decreases when the average speed increases.Figure 11 depicts the average summary diffusion delay whichis the average time for an agent to receive a random summarypacket. In other words, that is the average arrival time of anew summary at the agent. This can be measured by recordingthe timestamps whenever the agent receives a new summaryand then calculating the average. The figure exhibits the sametendency as Figure 10.

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500 600 700 800Fraction of harvested summaries

Time (seconds)

Ns=3/k=3Ns=3/k=1Ns=1/k=3Ns=1/k=1

(a) RWP

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500 600 700 800Fraction of harvested summaries

Time (seconds)

Ns=3/k=3Ns=3/k=1Ns=1/k=3Ns=1/k=1

(b) RT

Fig. 12. Fraction of harvested summaries with N = 300 and v = 15

25

15

5

300

200

100

0

300

600

900

1200

Time (s)

Avg. Speed Num. Nodes

#a=1/k=1#a=1/k=3#a=3/k=1#a=3/k=3

(a) RWP

25

15

5

300

200

100

0 300 600 900 1200 1500 1800 2100 2400 2700 3000

Time (s)


#a=1/k=1#a=1/k=3#a=3/k=1#a=3/k=3

(b) RT

Fig. 13. Effect of k-hop relay scope and multiple agents to average harvesting latency

25

15

5

300

200

100

0

100

200

300

Time (s)


#a=1/k=1#a=1/k=3#a=3/k=1#a=3/k=3

(a) RWP

25

15

5

300

200

100

0

100

200

300

400

500

600

700

800

Time (s)


#a=1/k=1#a=1/k=3#a=3/k=1#a=3/k=3

(b) RT

Fig. 14. Effect of k-hop relay scope and multiple agents to average diffusion latency

C. Effect of k-hop Relay Scope and Multiple Agents

Multi-hop relaying in passive diffusion and parallel deploy-ment of multiple harvesting agents can significantly reducethe harvesting latency.13 Figure 12 shows the cumulative har-vesting distributions for different values of k-hop relay scope

13Note that in the case of multiple agents, the harvesting latency is definedas the time interval to reach the completeness of the union of the summarysets harvested by agents.

(k = 1/3) and for different number of agents (Na = 1/3).We used the configuration of N = 300 and v = 15. For eachsetting we plotted the average of 15 runs. The initial positionsof agents were randomly selected within the map and therewas no control at all on their trajectories. The figure clearlyshows that k-hop relay scope and multiple agents reduce thelatency. Therefore, we conclude that the restricted mobilitypatterns can be overcome by deploying multiple agents or k-

hop relay scope.The impacts of the number of nodes and the average speed

on the average harvesting latency and the average diffusionlatency are shown in Figure 13 and Figure 14 respectively.The figures consistently show that regardless of the numberof nodes and the average speed, the latency of diffusion andharvesting decreases with k-hop relay scope and multipleagents.

Note that the completeness of the harvested summary setmainly depends on summary generation rate and acceptablelatency: given an application-dependent required generationrate, MobEyes permits to configure the most suitable tradeoffbetween latency/completeness and traffic overhead by properlychoosing k-hop relay scope and the number of police agentsas shown in the analysis. Furthermore, to maximize harvestingrate we could control the trajectories of agents. However,finding the tradeoff and controlling the trajectories of agentswill be a subject of future work.

D. Summary Diffusion/Harvesting Overhead

We study the average overhead of summary diffusion andharvesting. First, we show the average overhead of summarydiffusion by measuring the total number of received packetsper node during the simulation time of 1000 seconds. Figure15 shows the graphs as a function of both the number ofnodes and the average speed. From the figure, we see that theaverage number of received packets almost linearly increasesas the number of nodes increases. However, we also seethat the total number of received summaries independent ofaverage speed because we used a fixed advertisement interval(3 seconds). It seems like due to the fixed interval speedincrement does not reduce the harvesting latency, which isagainst our previous results. This can be explained as fol-lows. For a fixed advertisement interval, as average speedincreases, the probability of useful meeting (i.e., receiving anon-redundant summary) increases because the probability ofreceiving redundant summaries decreases (i.e., more mixingamong mobile nodes). For example, given a relative speed v∗,let us assume that the average period that any two nodes arewithin their communication ranges simply be 2R/v∗. Thenwith v∗ of 10 and 50 m/s, the periods will be 50 and 10seconds respectively. This implies that the cases of 10m/s has5 times higher chances of receiving redundant advertisementsthan the case of 50m/s. It is interesting to note that given anaverage speed and a certain level of overhead, there exists theoptimal advertisement period. It will be part of our future workto analytically determine the optimal advertisement periodgiven overhead constraints.

Figure 16 shows the total number of responses that the agentreceives during the entire harvesting process of 1000 seconds.For a given number of nodes, we see that RT has much lowernumber of received responses because of restricted mobilitypatterns. For example, the configurations of RWP with 100nodes is always better than the best configuration of RT. Ingeneral, the number of received responses is related to howmany nodes the agent meets, and thus it increases as the

1000

1500

2000

2500

3000

3500

4000

4500

5000

5 15 25Total number of received summaries

Average speed

RWP N=300RT N=300RWP N=200RT N=200RWP N=100RT N=100

Fig. 15. Total number of received packets

0

100

200

300

400

500

600

5 15 25Total number of received responses

Average speed

RWP N=300RT N=300RWP N=200RT N=200RWP N=100RT N=100

Fig. 16. Total number of received responses by the agent

number of node or the average speed increases as shown inFigure 16. Finally, we measure the number of sent packets asa function of k-hop relay scope. Our results show that as kincreases, the number of sent packets rapidly increases. Forexample, the number of sent packets with k = 1 is 334.58whereas that with k = 3 is 6317.3. In MobEyes, k-hop relayscope should be used in a constrained way such as k ≤ 3.Note that we can further reduce the overhead by using efficientflooding methods as shown in [31].

VI. SECURITY OF MOBEYES

In this section, we discuss security concerns of MobEyes,especially collecting whereabouts of all vehicles. We assumethe existence of Public Key Infrastructure (PKI). Every nodehas a private/public key pair at any given time, which is issuedthrough the authority such as DMV. Let PKA and SKA

denote the node A’s public and private key pair. Let H(·)denote the a one-way hash function. In MobEyes, each nodereads license plate numbers of nearby vehicles and prepares asummary which may contain a set of license plate numbers.The summary is then encrypted by using the Police’s publickey (PKCa

) to prevent other private vehicles from readingit. The encrypted summary is then attached with the digitalsignature for authentication. Thus, for a given summary SC1

from node C1, the node advertises the following:

{SC1}PKCa, {H({SC1}PKCa

)}SKC1

Neighbors of C1 will then authenticate the message andstores it in their local database. One of nodes, Cx, will finally

forward the message to the agent with its signature. Afterauthenticating the message, the agent decrypts it and storesthe original summary in its local database.

A. Attack model

We focus on false summary injection attacks, in whichan attacker’s goal is to create an illusionary view of nodelocations. For example, a malicious node M injects a summarythat says it is in location (x, y) at time t which is alsosupported by other spoofed summaries. We assume that thecompromised nodes can collude in their attacks. Our goal is todesign a mechanism for the agent to efficiently filter out suchinjected false summaries. We assume that the attacker cannotlaunch a Sybil attack, where a node illegitimately claimsmultiple identities. This assumption is based on the followingobservations. First, the authority issues a public/private keypair for a given vehicle (i.e., unique identify per node). Second,nodes can use their visual sensors (with object recognitionfilters) to check the physical existence of vehicles with theirgeographical location inputs from GPS. Similarly, nodes canuse the multi-channels of DSRC standard to test radio resourceas a method of direct validation [20].

B. Previous solutions

“Static” sensor networks deal with this problem by usingstatistical methods and hop-by-hop authentication [29], [33],[37]. [29] purely uses a statistical analysis. A sensing nodeuses sliding-window based correlation tests with its neighborsto generate the statistically sound sensed data. In addition,the global sink checks an irregular pattern of abnormality fora given area and high abnormal rate of abnormalities for aparticular node. Note that such techniques are based on “static”sensor deployment and thus, are not directly applicable tomobile sensor networks.

On the other hand, [33] and [37] use a simple votingmechanism such that at least k nodes at the center of stimulushave an agreement in order to report an event to the sink. Inaddition, they use en-route filtering: while a sensor report isforwarded to the sink, the report is authenticated in each hopand a spurious report is dropped. A simple neighbor-basedinstant voting is valid even for “mobile” sensor networks, buten-route filtering is not applicable because MobEyes does notuse multi-hop routing.

In their seminal work, Golle et al. [8] have proposed a modelof the VANET that evaluates the validity of VANET data. Themodel basically assumes that 1) a node can distinguish its localneighbors via sensors; 2) nodes share this information suchthat distinguishability extends beyond intermediate nodes; 3)the network is sufficiently dense enough to “fully” connectone another; 4) nodes can authenticate their communicationusing public key infrastructure. A node searches for possibleexplanations (either binary or probabilistic) for the collecteddata by assuming that malicious nodes may be present. Theadversarial parsimony heuristic is used to assume that anattack involves only a few malicious nodes. As a result, datawith the highest score of explanation is accepted by a node. In

M

M*

SpoofActual

Fig. 17. Spatial graph of vehicles: an edge represents a physical observation,and a vertex represents a node located in the Cartesian coordinate. Maliciousnode M creates spoofs to support a false location M∗. Dashed lines representmissing observations.

practice, it is hard to directly apply the model for the followingreasons: 1) synchronization of sensing to create a search spaceat any moment is non-trivial, and 2) “strong connectivity”assumption is not generally true in a typical VANET.

C. Proposed solution

Each node generates a summary which consists of as-sertions, i.e., physical observations from its sensor. Let〈Ci, (x, y), t〉Cj

denote an assertion generated by node Cj afterobserving Ci at location (x, y) at time t. MobEyes limitsthe assertion generation rate such that nodes cannot injectassertions more than a certain threshold over a fixed periodof time. A set of assertions at a given time is equivalent to thesnapshot of the network. This can be effectively representedby using a spatial graph as shown in [8] (see Figure 17). Sinceit is nearly impossible to synchronize all sensors to generateassertions at a specific time point, we use a discrete timeanalysis with a time slot size of Δt seconds. Thus, a set ofassertions generated within a time slot is used for generatinga spatial graph. Note that the slot size is a system parameterand is determined based on the average speed and density ofvehicles.

The agent constructs a spatial graph for each time slotas follows. First, we use a vertex to represent a node andmap it into the Cartesian coordinate. When there are multipleassertions for a given node, its trajectory is considered, andthe average point is used. Large deviation means that the nodeis potentially involved in the attack. To this reason, if thedeviation is larger than a certain threshold, we identify thatthe node is involved in the attack. Second, a directed link isthen used to represent the physical observation of an assertion.For those identified nodes, we drop the links.

We now propose two heuristics to efficiently handle injectedfalse data:

1) Mutual support-based filtering – For a given spatialgraph, we build an undirected graph such that eachpair {u, v} of points are connected if there are directededges (u, v) and (v, u). Figure 18 shows an undirectedspatial graph of Figure 17 after mutual support basedfiltering. As a result, we have three sub-spatial graphs.We simply use the size of each sub-spatial graph todetect the attacker. For example, Figure 18 shows that we

M

M*

SpoofActual

Fig. 18. Undirected spatial graph after mutual support based filtering: falseassertions result in three sub-spatial graphs and the size (number of nodes) isused to determine potential attackers.

MM*

X

SpoofActual

(a) Spatial graph at time t

M

M*

X

SpoofActual

(b) Spatial graph at time t + 1

Fig. 19. Temporal correlation-based filtering: Node X was used for an attackby node M , but it fails to meet the temporal correlation.

see that the attacker M becomes a single vertex in thespatial graph. In general, the size of good nodes are muchlarger than that of malicious nodes given the adversarialparsimony assumption as well as the assertion generationrate limit. Thus, the identification of attacker sets couldbe done easily.

2) Temporal correlation-based filtering – Since mobilityof nodes exhibits spatial-temporal correlation, we con-sider this among spatial graphs. Given that we havetwo spatial graphs at t and t + 1, the maximum dis-tance that a random node can travel is bounded as√

(xt+1 − xt)2 + (yt+1 − yt)2 < δ where δ is a systemdependent parameter. Figure 19, for example, shows thatthe colluded node X was exposed at time t to other nodesand was used in order to support M∗ at time t+1. Suchinconsistency can be easily detected by using temporalcorrelation-based filtering.

Note that the potential of reduced quality of the assertionsdue to network environment influence exists, for example,nodes are located in sparse areas or the location informationcan not be extracted correctly due to reduced sensing qualityor unidirectional links. When this happens, the mutual support-based filtering may also filter out some legitimate nodes,raising false positive ratio. This situation can be justified bythe collective information on the density of the network, or belater proved when sensing or transmission quality improves.Further, the agent can improve its detection ability by using aniteration of more rounds with different Δt. An attack shouldreveal a more persistent erroneous location than the transitnetwork dynamics. With the latter, the transit property will besmoothed in a larger Δt.

VII. CONCLUSION

In this paper, we proposed MobEyes, a system that formsautonomous smart mobs with VSN for proactive urban moni-toring. The MobEyes key component is MDHP, which worksby disseminating/harvesting summaries of sensed data, and byexploiting original mobility-assist opportunistic protocols. Thereported evaluation results show that MDHP is scalable upto thousands of nodes with limited overhead and reasonablelatencies. Moreover, MobEyes can be configured to achievethe most suitable tradeoff between latency/completeness andoverhead by properly choosing the k-hop relay scope and thenumber of police agents. Morover, our preliminary securitysolutions show that MobEyes can effectively handle injectedfalse data by attackers. These encouraging results stimulatefurther research. In particular, we are extending the MobEyesevaluation by considering regular nodes that discard senseddata (and associated summaries) after a specified time intervalor when their local storage is full. In addition, we are extendingthe MobEyes prototype with the ability to determine the besttrajectory to be followed by randomly positioned agents whenthey collaborate in summary harvesting.

ACKNOWLEDGEMENT

We would like to thank Xiaoyan Hong and Jiejun Kong forsparing their valuable time to review the security section ofthe paper.

REFERENCES

[1] F. Bai and A. Helmy. Impact of Mobility on Mobility-AssistedInformation Diffusion (MAID) Protocols. Technical report, USC, July2005.

[2] A. Datta. Autonomous Gossiping: A Self-organizing Epidemic Algo-rithm for Selective Information Dissemination in Wireless Mobile Adhoc Networks. In ICDCS 2003 Doctoral Symposium, Providence, RhodeIsland USA, May 2003.

[3] L. Dlagnekov and S. Belongie. Recognizing Cars. Technical ReportCS2005-0833, UCSD CSE, 2005.

[4] Standard Specification for Telecommunications and Information Ex-change Between Roadside and Vehicle Systems - 5 GHz Band DedicatedShort Range Communications (DSRC) Medium Access Control (MAC)and Physical Layer (PHY) Specifications, Sept. 2003.

[5] L. Fan, P. Cao, and J. Almeida. Summary Cache: A Scalable Wide-AreaWeb Cache Sharing Protocol. In ACM SIGCOMM, Vancouver, Canada,Aug.-Sept. 1998.

[6] GDS: Technology being tested to detect and report gunshots.http://www.computerworld.com/mobiletopics/mobile/technology/story/0,10801,77554,00.html.

[7] P. B. Gibbons, B. Karp, Y. Ke, S. Nath, and S. Seshan. IrisNet: AnArchitecture for a Worldwide Sensor Web. IEEE Pervasive Computing,2(4):22–33, Oct.-Dec. 2003.

[8] P. Golle, D. Greene, and J. Staddon. Detecting and Correcting MaliciousData in VANETs. In VANET’04, Philadelphia, PA, October 2004.

[9] M. Grossglauser and D. Tse. Mobility Increases the Capacity of MobileAd-hoc Networks. In IEEE INFOCOM, Anchorage, Alaska, USA, Apr.2001.

[10] M. Grossglauser and M. Vetterli. Locating Nodes with EASE: MobilityDiffusion of Last Encounters in Ad Hoc Networks. In IEEE INFOCOM,San Francisco, CA, USA, Mar.-Apr. 2003.

[11] Z. Haas, J. Y., and H. L. Li. Gossip-based Ad Hoc Routing. In IEEEINFOCOM, New York, NY, USA, June 2002.

[12] C. Intanagonwiwat, R. Govindan, and D. Estrin. Directed Diffusion: aScalable and Robust Communication Paradigm for Sensor Networks. InACM MOBICOM’00, Boston, MA, USA, 2000.

[13] R. M. Karp, C. Schindelhauer, S. Shenker, and B. Vocking. Randomizedrumor spreading. In IEEE Symposium on Foundations of ComputerScience, 2000.

[14] G. Korkmaz, E. Ekici, F. Ozguner, and U. Ozguner. Urban Multi-Hop Broadcast Protocols for Inter-Vehicle Communication Systems. InVANET’04, Philadelphia, PA, USA, Oct. 2004.

[15] J. LeBrun, C.-N. Chuah, D. Ghosal, and M. Zhang. Knowledge-BasedOpportunistic Forwarding in Vehicular Wireless Ad Hoc Networks. InIEEE VTC’05, Dollas, TX, USA, Sept. 2005.

[16] U. Lee, E. Magistretti, B. Zhou, M. Gerla, P. Bellavista, and A. Corradi.Efficient Data Harvesting in Mobile Sensor Platforms. In IEEE PerSeNSWorkshop, Pisa, Italy, Mar. 2006.

[17] B. Liu, P. Brass, O. Dousse, P. Nain, and D. Towsley. Mobility improvescoverage of sensor networks. In MobiHoc’05, Urbana-Champaign, IL,USA, May 2005.

[18] J. Luo and J.-P. Hubaux. A Survey of Inter-Vehicle Communication.Technical Report ICTR-200424, EPFL, 2004.

[19] D. Nain, N. Petigara, and H. Balakrishnan. Integrated Routing andStorage for Messaging Applications in Mobile Ad Hoc Networks. InWiOpt’03, Sophia-Antipolis, France, Mar. 2003.

[20] J. Newsome, E. Shi, D. Song, and A. Perrig. The Sybil Attack in SensorNetworks: Analysis & Defenses. In IPSN’04, Berkeley, CA, Apr. 2004.

[21] ns-2 (The Network Simulator). http://www.isi.edu/nsnam/ns/.

[22] S. Ratnasamy, B. Karp, L. Yin, F. Yu, D. Estrin, R. Govindan, andS. Shenker. GHT: A Geographic Hash Table for Data-Centric Storage.In WSNA’02, Atlanta, GA, USA, Sept. 2002.

[23] H. Rheingold. Smart Mobs: The Next Social Revolution. Basic Books,2003.

[24] A. K. Saha and D. B. Johnson. Modeling Mobility for Vehicular AdHoc Networks. In ACM VANET’04, Philadelphia, PA, USA, October2004.

[25] R. C. Shah, S. Roy, S. Jain, and W. Brunette. Data MULEs: Modelinga Three-tier Architecture for Sparse Sensor Networks. Elsevier Ad HocNetworks Journal, 1(2-3):215–233, Sept. 2003.

[26] A. P. Sistla, O. Wolfson, and B. Xu. Opportunistic Data Dissemination inMobile Peer-to- Peer Networks. In International Symposium on Spatialand Temporal Databases, Angra dos Reis, Brazil, Aug. 2005.

[27] T. Small and Z. J. Haas. The Shared Wireless Infostation Model - ANew Ad Hoc Networking Paradigm (or Where There is a Whale, Thereis a Way). In ACM MOBIHOC’03, Annapolis, Maryland, USA, June2003.

[28] A. Steed, S. Spinello, B. Croxford, and C. Greenhalgh. e-Science inthe Streets: Urban Pollution Monitoring. In UK e-Science All HandsMeeting, Nottingham, UK, Sept. 2003.

[29] S. Tanachaiwiwat and A. Helmy. Correlation Analysis for AlleviatingEffects of Inserted Data in Wireless Sensor Networks. In MobiQui-tous’05, San Diego, PA, Jul. 2005.

[30] A. Vahdat and D. Becker. Epidemic Routing for Partially-ConnectedAd Hoc Networks. Technical Report CS-200006, Duke University, Apr.2000.

[31] B. Williams and T. Camp. Comparison of broadcasting techniques formobile ad hoc networks. In ACM MOBIHOC’02, Lausanne, Switzerland,Jun. 2002.

[32] H. Wu, R. Fujimoto, R. Guensler, and M. Hunter. MDDV: a Mobility-entric Data Dissemination Algorithm for Vehicular Networks. InVANET’04, Philadelphia, PA, USA, Oct. 2004.

[33] F. Ye, H. Luo, S. Lu, and L. Zhang. Statistical En-route Filtering ofInjected False Data in Sensor Networks. In INFOCOM’04, Hong Kong,Mar. 2004.

[34] J. Zhao and G. Cao. VADD: Vehicle-Assisted Data Delivery in VehicularAd Hoc Networks. In IEEE INFOCOM, Barcelona, Spain, Apr. 2006.

[35] W. Zhao, M. Ammar, and E. Zegura. A Message Ferrying Approachfor Data Delivery in Sparse Mobile Ad Hoc Networks. In ACMMOBIHOC’04, Tokyo, Japan, May 2004.

[36] B. Zhou, K. Xu, and M. Gerla. Group and Swarm Mobility Models forAd Hoc Network Scenarios Using Virtual Tracks. In IEEE MILCOM,Monterey, CA, USA, Oct.-Nov. 2004.

[37] S. Zhu, S. Setia, S. Jajodia, and P. Ning. An Interleaved Hop-by-Hop Authentication Scheme for Filtering False Data Injection in SensorNetworks. In IEEE Symposium on Security and Privacy, Oakland, CA,May 2004.

mobeyes: smart mobs for urban monitoring with …netlab.cs.ucla.edu › wiki › files ›...

Documents