csi: a paradigm for behavior-oriented delivery services in … · 2008-07-08 · arxiv:0807.1153v1...

arX

iv:0

807.

1153

v1 [

cs.N

I] 8

Jul

200

81

CSI: A Paradigm for Behavior-oriented DeliveryServices in Mobile Human Networks

Wei-jen Hsu1, Debojyoti Dutta2, and Ahmed Helmy11Department of Computer and Information Science and Engineering, University of Florida2Cisco Systems, Inc.

Email: 1 {wjhsu, helmy}@ufl.edu,[email protected]

Abstract—We propose behavior-oriented services as a newparadigm of communication in mobile human networks. Ourstudy is motivated by the tight user-network coupling in futuremobile societies. In such a paradigm, messages are sent toinferred behavioral profiles, instead of explicit IDs. Our paperprovides a systematic framework in providing such services. First,user behavioral profiles are constructed based on traces collectedfrom two large wireless networks, and their spatio-temporalstability is analyzed. The implicit relationship discovered betweenmobile users could be utilized to provide a service for messagedelivery and discovery in various network environments. Asanexample application, we provide a detailed design of such aservice in challenged opportunistic network architecture, namedCSI. We provide a fully distributed solution using behavioralprofile spacegradients and small world structures.

Our analysis shows that userbehavioral profilesare surpris-ingly stable, i.e., the similarity of the behavioral profile of a userto its future behavioral profile is above 0.8 for two days and0.75 for one week, and remains above 0.6 for five weeks. Thecorrelation coefficientof the similarity metrics between a userpair at different time instants is above 0.7 for four days, 0.62 fora week, and remains above 0.5 for two weeks. Leveraging sucha stability in user behaviors, the CSI service achieves deliveryrate very close to the delay-optimal strategy (above94%), withminimal overhead (less than84% of the optimal). We believe thatthis new paradigm will act as an enabler of multiple new servicesin mobile societies, and is potentially applicable in server-based,heterogeneous or infrastructure-less wireless environments.

I. I NTRODUCTION

We envision future networks that consist of numerous ultraportable devices delivering highly personalized, context-awareservices to mobile users and societies. Such scenarios elicitstrong, tight-coupling between user behavior and the network.Users’ mobility and on-line activities significantly impactwireless link characteristics and network performance, andat the same time, the network performance can potentiallyinfluence user activities and behavior. Such a tight user-network coupling provides a rich set of opportunities and posesseveral challenges. On one hand, fundamental understandingof the mobile user behavior becomes crucial to the design andanalysis of future mobile networks. On the other hand, novelservices can now be introduced and utilize such a couplingto effectively navigate mobile societies, providing efficientinformation dissemination, search and resource discovery.

In this paper, we propose a novel behavior-driven commu-nication paradigm to enable a new class of services in mobilesocieties. Current communication paradigms, including unicastand multicast, require explicit identification of destinationnodes (through node IDs or group membership protocols),

while directory servicesmap logical, interest-specific queriesinto destination IDs where parties are then connected usinginterest-oblivious protocols. The power and scalability of suchconventional paradigms might be quite limited in the contextof future, highly dynamic mobile human networks, where itis desirable in many scenarios to support implicit membershipbased on interest. In such scenarios, membership in interest-groups is not explicitly expressed by users, it is rather implic-itly and autonomously inferred by network protocols basedon behavioral profiles. This removes the dependence on thirdparties (e.g. directory lookup), maintenance of group mem-bership (e.g., in multicast) or the need to flood user intereststo the whole network, and minimizes delivery overhead touninterested users.

Applying such a behavior-driven paradigm in mobile net-works poses several research challenges. First, how can userbehavior be captured and represented adequately? Second, isuser behavior stable enough to enable meaningful predictionof future behavior with a short history? How can such servicesbe provided when the interest or behavior cannot be centrallymonitored and processed? And finally, can we design privacy-preserving services in this context?

To address these questions we propose a systematic frame-work with two phases 1) behavioral profile extraction byanalyzing large-scale empirical data sets, investigatingthestability of users in the behavioral space, and 2) leverage thebehavioral profiles for service design – We use the implicitstructure in the human networks to guide message and querydissemination given a target profile.

Specifically, we first analyze network activity traces anddesign a summary of userbehavioral profilesbased on themobility preferences. The similarity of thebehavioral profilefor a given user to its future profile is high, above0.75 for eightdays and remains above0.6 for five weeks. The surprisingobservation is that, the similarity metric between a pair ofuserspredicts their future similarity reasonably well. The correlationcoefficient between their current and future similarity metricsis above0.7 for four days, and remains above0.5 for fifteendays.

This phenomenon demonstrates that thebehavioral profilewe design is an intrinsic property of a given user and a validrepresentation of the user for a good period of time into thefuture. We refer to this phenomenon as thestability of userbehavioral profiles, which can be used to map the users intoa high dimensionalbehavioral space. Thebehavioral spaceisdefined as a space where each dimension reflects a particular

http://arxiv.org/abs/0807.1153v1

2

interest. For example, when we consider mobility preferences,each dimension represents the fraction of time spent at a givenlocation. The position of users in the behavioral space reflectshow similar they are with respect to the behavioral profilewe construct. We propose a new communication paradigm, inwhich atarget profileis used to replace network IDs to indicatethe intended receiver(s) of a message (i.e., those withmatchingbehavioral profile to the target profile chosen by the sender arethe intended receivers.). It is aCommunication paradigm inhuman networks based on theStability of the user behavioralprofile to discover the receiversImplicitly, abbreviated asCSI.We present two modes of operation under the over-archingparadigm: thetarget mode (CSI:T) and thedisseminationmode (CSI:D). Thetarget modeis used when thetarget profileis specified in the same context as thebehavioral profile(i.e.,the target profile is in terms of mobility preferences). Thedissemination mode, on the other hand, is used when thetargetprofile is de-coupled from mobility preferences.

We show that our CSI schemes perform very close tothe delay-optimal schemes assuming global knowledge andimprove significantly over the baseline dissemination schemes.For theCSI:T mode, comparing with the delay-optimal proto-col, our protocol is close in terms of success rate (more than94%) and has less overhead (less than84% to the optimal),and the delay is about40% more. For theCSI:D mode,our protocol features lower storage overhead than the delay-optimal protocol with more than98% success rate –CSI:Duses a storage overhead less than60% of the delay-optimalprotocol, while the delay ofCSI:D is about32% more thanthe optimal.Our Contributions(1) We introduce the notion of multi-dimensionalbehavioralspace, and devise a representation of userbehavioral profilesto map users into the behavioral space. Our study is the firstto establish conditions for stability of the relationship betweencampus users in this space.(2) We proposeCSI, a new communication paradigm deliver-ing message based on user profiles. The target profile in CSIcan even be independent of the context of behavioral profilewe use to construct thebehavioral space.(3) We design an efficient dissemination protocol utilizingthestability of behavioral profiles and SmallWorld in mobile soci-eties, then empirically evaluate and validate the efficacy of ourproposal using large-scale traces from university campuses.

The outline of the rest of the paper is as follows. We discussthe related work in section II and important background insection III. This is followed by an analysis to understandthe user behavioral pattern in section IV. We further discussthe potential usages of this understanding in section V anddesign ourCSI schemes in section VI as an example. Weuse simulations to evaluate the performance of CSI schemesin section VII. Finally, we discuss some finer points insection VIII and conclude in section IX.

II. RELATED WORK

We conduct the first detailed systematic study on the spatio-temporal stability of user behaviors in mobile societies, anew

dimension that has not been considered before. We lay thefoundation of this work on a solid analysis of empirical userbehaviors, enabled by extensive collections of user behavioraltraces. Many of them can be found in the archives at [1],[2]. Our effort on the extraction of behavioral profiles andbehavior-based user classification is related to the reality min-ing project [16] and the work by Hsu et al. [4] and Ghosh etal. [20]. We leverage the representation of mobility preferencematrix defined by Hsu et al. [4], which reveals more detaileduser behavior than the five categories representation used in thereality mining [16] and the presence/absence encoding vectorused by Ghosh et al. [20].

In centralized trace analysis, the capability of classifyingusers based on their mobility preferences [4] or periodic-ity [19] could potentially lead to applications such as behavior-aware advertisements or better network management. Whileunderstanding user behavior for these applications has itsown merit, applications in centralized scenario (where userbehaviors are collected, processed and mined at an aggregationpoint) are not our major focus in the paper.

The major application considered in this paper is to design amessage dissemination scheme in decentralized environments.While several previous works exist in the delay tolerantnetwork field, most of them (e.g. [3], [5], [17], [6], [10])consider one-to-one communication pattern based on networkidentities. The one-to-many communication targeted at a be-havioral group presented in this paper is a new paradigmin decentralized environments. Some of the previous workassume existing infrastructure: PeopleNet [18] uses specializedgeographic zones for queries to meet. The queries are deliveredto randomly chosen nodes in the corresponding zone throughthe infrastructure. Others (e.g., [17], [10]) rely on persistentcontrol message exchanges (e.g., the delivery probability) foreach node to learn the structure of the network, even whenthere is no on-going traffic. From the design point of view,our approach differs from them by avoiding such persistentcontrol message exchanges to achieve better power efficiency,an important requirement in decentralized networks.

The spirit of our design is more similar to the work byDaly et al. [6], in which each node learns the structure of thenetwork locally and uses the information for message forward-ing decisions. They use the SmallWorld network structure [7]which often exists in human networks (as has been investigatedin [14], [9]) and push the message toward nodes with highcentrality to improve the chance of delivery. However, thelearning process still involves message exchanges about pastencounters, even in the absence of actual traffic. Our work,on the other hand, relies on the intrinsic behavioral pattern ofindividual nodes to “position” themselves in the behavioralspace in a localized and fully distributed manner, withoutexchanging encounter history between nodes. The use of userbehavioral profiles to understand the structure of the spaceis similar to the mobility space routing by Leguay et al. [3]and the utility-based routing by Aiklas et al. [8]. The majordifferences between this work and [3], [8] are two fold: First,we design the CSI:D mode, in which the target profile need notbe related to the behavioral profile based on which the messagedissemination decisions are made. Second, we also provide

3

2 IILFH' RUP

»»»»»»

¼

º

««««««

¬

ª

QWW

ML

[[

[

[

��

�

��

��

��

��

��

��

��

(DFK �URZ �UHSUHVHQ WV �WKH

SHUFHQ WDJH �R I�WLP H �VSHQ W�D W�HDFK

ORFD WLRQ �IR U�D �GD\

( DFK �FR OXP Q �FR UUHVSRQGV�WR �D �ORFD WLRQ

$ Q �HQ WU\ �UHS UHVHQ WV�WKH

SHUFHQ WDJH �R I�RQ OLQH �WLP H �GX ULQJ

WLP H �GD\ �L�D W�ORFD WLRQ �M

Fig. 1. Illustration of the association matrix to describe agiven user’s locationvisiting preference.

a non-revealing option in our protocol, thus no node has toexplicitly reveal its behavioral pattern or interests to others,as opposed to [3], [8]. The idea of merging similar users intoa group based on their behavior has also been proposed in atwo-tiered routing structure [10].

Another related paper is the work by Hsu et al. [15] wherethe authors focus on only sending messages to users withsimilar behavioral profile to the sender. In this paper weintroduce the notion of thetarget profile to decouple thebehavioral profile of the sender from the destination profilein the message . This significantly enhances the capability ofthe message dissemination schemes, by allowing the senderto specify target behavioral profile (in CSI:T mode), or evensome target profiles that are orthogonal to the behavior basedon which we measure the similarity between users (in CSI:Dmode).

III. B ACKGROUND

A. Mobility-based User Behavior Representation

We represent mobile user behavior of a given user usingthe association matrixas illustrated in Fig. 1. In the matrix,each row vector describes the percentage of time the userspends at each location on a day, reflecting the importanceof the locations to the user1. In [4] it has been shown that thelocation visiting preferencescan be leveraged to classify usersof wireless networks on university campuses. For a given user,the singular value decomposition (SVD) [21] is applied to itsassociation matrixM , such that

M = U · Σ · V T , (1)

where a set ofeigen-behaviorvectors,v1, v2, ..., vrank(V ) thatsummarize the important trends in the original matrixMcan be obtained from matrixV , with corresponding weightswv1 , wv2 , ..., wvrank(V )

calculated from the eigen-values inmatrix Σ. This set of vectors are referred to as thebehav-ioral profile of the particular user, denoted asBP (M), asthey summarize the important trends in userM ’s behavioralpattern. Thebehavioral similaritymetric between two usersA andB is defined based on their behavioral profiles, vectorsai’s andbj ’s and the corresponding weights, as

Sim(BP (A),BP (B)) =

rank(A)∑

i=1

rank(B)∑

j=1

waiwbj |ai · bj |, (2)

1While there may be numerous other representations of user behavior, weshall show that this representation possesses desirable characteristics for thepurposes of this study. Further investigation of other representations is asubject of future work.

TABLE IFACTS ABOUT STUDIED TRACES

Trace source USC [12] Dartmouth [13]Time/duration 2006 spring 2004 spring

of trace semester quarterStart/End 01/25/06- 04/05/04-

time 04/28/06 06/04/04Unique

137 buildings545 APs/

locations 162 buildingsUnique MACs analyzed 5,000 6,582

which is essentially the weighted cosine similarity betweenthe two sets ofeigen-behaviorvectors.

B. Traces

In this paper, we seek a realistic, deep understanding ofuser behavior patterns by analyzing semester/quarter-long userbehavioral logs collected from operational campus networksfrom public trace archives [1], [2]. We present results basedon two data sets from the University of Southern California(USC) and the Dartmouth College (Dartmouth). The details ofthe data sets are listed in Table I.

We choose to use WLAN traces as they are the largestuser behavioral data sets available. The information availablefrom these anonymized traces contains many aspects of thenetwork usage (e.g., time-location information of the usersby tracking the association and disassociation events withthe access points, amount of traffic sent/received, etc.). Therichness in user behavioral data poses a challenge inrepre-sentingthe user behavior in a meaningful way, such that therepresentation not only reveals an intrinsic, stable behavioralprofile of a user, but the identified behavioral profile alsoleads to practical applications. We show in this paper that thelocation visiting preferences(which is only a subset of the userbehavioral data) is a stable attribute for both individual usersand the relationship between users. This property will provequite valuable to the design of efficient message disseminationschemes, which we empirically validate using the above traces.

IV. U NDERSTANDING SPATIO-TEMPORAL

CHARACTERISTICS OFUSERBEHAVIORAL PATTERNS

In this section we introduce our analysis of user behavioralpatterns and its significance on the service design. Whileprevious works on user classification based on long-termbehavioral trend [4], [20], [19] are useful and in line withour goal, the stability of such classification over time hasnot been studied systematically. In particular, the short-termbehavior of a user may deviate significantly from thenorm,and the stability of user behavioral profiles is a decisivefactor for whether it can be leveraged to represent the user’sfuture behavior. In this section we investigate the followingquestions: (1) How long of behavioral history do we need toclassify a user? and (2) How much does the behavior of agiven user and its relationship with other users change withrespect to time?

We consider the effect of the amount of past history (of userbehavior) on itsbehavioral profiles. Each user uses the location

4

7LPH7

G G

7�

7�

Fig. 2. Illustration: consider the trailingd days of behavioral profile at timepoints that areT days apart.

7 LP H�JDS ��7 �

6LPLODULW\�EHWZHHQ�WKH��PRELOLW\

SURILOHV�RI�WKH�VDPH�XVHU

� ��

� ��

� ��

� ��

� ��

� ��

� ��

� ��

� ��

� ��

�

� ��

' DUW��GD\V ' DUW��GD\V ' DUW��GD\V

86& ��GD\V 86& ��GD\V 86& ��GD\V

Fig. 3. Similarity metrics for the same user at time gapT apart.

visiting preference vectors in the pastd days to summarizethe behavior in the most recent history – the user retainsdlocation visiting preference vectors for these days, organizethem in a matrix, and use singular value decomposition toobtain thebehavioral profile, as described in section III-A.We seek to understand howd influences the representationand similarity calculations. More specifically, we look into twoimportant aspects: (1) Whether the representation of a givenuser is stable across time, and (2) whether the relationshipsbetween user pairs remain stable as time evolves.

We first consider the stability of the representation of a givenuser. Considering two points in time that areT days apart,we obtain thebehavioral profilesfor the same user at bothend points, using the logs of the trailingd days ending atthose end points, as illustrated in Fig. 2. Then we use thesimilarity metric defined in Eq. (2) to compare how stable auser’s behavioral profile is to one’s former self afterT dayshas elapsed. The average results with various values of thetime gap,T , and considered behavioral historyd are shownin Fig. 3. We notice that, even if we collect a short history ofuser behavior (sayd = 3), the representation is similar to thebehavior of the user for a long time into the future. When weconsiderT = 35 days apart, the behavioral profiles from thesame user still show high similarity, at about0.6. The amountof history used does not influence the result too much when

7 LP H�JDS ��7 �

&RUUHODWLRQ�FRHIILFLHQW�EHWZHHQ

WKH�VLPLODULW\�PHWULFV

�

� ��

� ��

� ��

� ��

�

� ��

86& ��GD\V 86& ��GD\V 86& ��GD\V' DUW��GD\V ' DUW��GD\V ' DUW��GD\V

Fig. 4. Correlation coefficient of the similarity metrics between the sameuser pair at time gapT apart.

the consideredT is large enough to avoid overlaps in the usedbehavioral history (i.e., whenT > d). We conclude that onuniversity campuses, thebehavioral profilefor a given user isstable, i.e., it remains highly similar for the same user acrosstime. One interesting note is that, when the behavioral profileincludes only part of a week (d < 7), the similarity of the userto its former self shows a weekly pattern (i.e., whenT is aninteger multiple of seven, the similarity peaks), especially inUSC.

Second, we try to quantify how the behavioral similaritybetween the same pair of users varies with time. For this part,we use Eq. (2) to calculate the similarity between two users,AandB, at two points in time,SimT1(A,B) andSimT2(A,B),whereT1 andT2 areT days apart. We perform this calculationto all user pairs, and then calculate the correlation coefficientof the similarity metrics obtained after aT -day interval, as

r =

∑∀A,B

(X −X)(Y − Y )

NSXSY

, (3)

whereX = SimT1(A,B) and Y = SimT2(A,B), and thenotationsX andSX denote the average and standard deviationof X , respectively.N is the total number of user pairs. Thecorrelation coefficient quantifies how stable the relationshipbetween user pairs is. We repeat the calculation for all pairsof users with variousd andT values to arrive at Fig. 4. Weobserve that the similarity metrics between user pairs correlatereasonably well if the considered time periods are not farapart. ForT smaller than one week, the correlation coefficientis above0.62. This indicates, once the similarity between apair of user is obtained, it remains a reasonable predictor fortheir mutual relationship for some time period into the future.Although the reliability of the stale similarity data decreaseswith respect to time, the current similarity of a user pairremains moderately correlated to their future similarity,in thetime range up to several weeks. The correlation is above0.4for up to five weeks.

The investigation establishes that the user behavioralprofile is a stable feature to represent the users – therepresentation of an individual user and the relationshipbetween users are well correlated with the past historyfor the near future. Thus we map the behavioral profile to avirtual behavioral space[3], in which each user’s behavior isquantified as a high dimensional point2. The mutual similaritymetric between users is a function of their respective positionsin this space. In this paper, when we say two users aresimilar,it means they areclose in the behavioral space (i.e., thedistancebetween the two users is small). We also use theterm neighborhood of a nodeto refer to the other nodes thataresimilar to this particular node in the behavioral space.

V. THE BEHAVIOR-DRIVEN COMMUNICATION PARADIGM

Profiling users based on stable behaviors is a fundamentalstep to understand human behavior. Motivated by the stabilityof user behavioral profiles, we introduce abehavior-driven

2The dimension of the behavioral space is the same as themobilitypreference vectorrepresentation, typically in the order of a hundred for thesetwo campuses.

5

communication paradigmwhere we useuser behavioral pro-files, instead of network IDs, to represent users. We envisionthat such a radical approach has several benefits.

First, it enables behavior-aware message delivery in thenetwork without mapping attributes to network IDs. As eachuser maintains its behavioral profile, it is now possible todeliver announcements about a sports event on campus towardssports enthusiasts (e.g., people who visit the gym often) oradvertise a performance at the school auditorium to the regularattendees of such events.

Second, it facilitates the discovery of nodes with certainbehavior patterns. Consider, for example, in the messageferry [11] architecture where nodes with high mobility movemessages across the network to facilitate the communicationbetween otherwise disconnected nodes. One can choose atarget profile that reflects a mobility profile and thus eliminatethe need of knowing the identity of the ferry beforehand orenforcing this mobility pattern on a controlled node – a typicaluser who happens to have the desired mobility pattern can bediscovered and serve as a ferry.

Our behavior-driven communication paradigmis applica-ble to several architectures. In thecentralized server-basedarchitecture, user profiles could be collected and stored ata data repository, and mined for user classification, abnor-mality detection, or targeted advertisements. In thecellularnetworks, the low-bandwidth channel between the users andthe infrastructure can be leveraged to exchange behavioralprofiles and match users. In this paper, however, we considera decentralized infrastructure-less networks, and focus onhow stable behavioral profiles are used for better messagedissemination. We name this scheme asCSI, since it is aCommunication scheme based on theStable,Implicit structurein human networks.

VI. PROTOCOL DESIGN

In this section, we first present our premises and designrequirements for the CSI schemes. We then discuss the designof the CSI schemes based on in-depth understanding of therelationship between similar behavioral profiles and encounterevents.

A. Assumptions and Design Requirements

We assume that each node profilesits own behavioralpatternby keeping track of the visiting durations of differentlocations and summarizing the behavioral profile using thetechnique discussed in III-A. This is an individual effort byeach node involving no inter-node interactions. This can bedone by the nodes over-hearing the beacon signals from thefixed access points in the environment to find out its currentlocation. Note that, the use of these beacon signals is only forthe node to profile its own behavior – they are not used to helpthe communication in our protocols (we will re-visit detailedpoints of this assumption in section VIII). Also, for the ease ofunderstanding, we assume in this section that nodes are willingto send its behavioral profiles to other nodes when needed. Aprivacy-preserving option that eliminates this operationis alsodiscussed in section VIII.

The goal of ourCSI scheme is to reach a group of nodesmatching with the target profile specified by the sender, underthe following performance requirements: (1) The protocolshould be scalable, in particular not being dependent on acentralized directory to map target profiles to user identities.(2) It should work in an efficient manner and avoid transmis-sion and storage overhead when possible. Also, it should avoidcontrol message exchanges in the absence of data traffic. (3)The syntax of the target profile should be flexible, allowing thetarget profile to be not in the same context as the behavioralprofiles we use to represent the users. Also the operation of theprotocol should be flexible to allow tradeoff between variousperformance metrics. And finally, (4) the design should berobust and help in protecting user privacy.

We design two modes of operation for theCSI schemeunder the above requirements. When the target profile is inthe same context as the behavioral profile (in our example,since the behavioral profile is a summary of user mobility, thiscorresponds to the scenario when the target profile describesusers thatmove in a particular way), theCSI:Target mode(CSI:T)should be used. When the target profile is irrelevant tothe behavioral profile (e.g., when I want to send to everyoneinterested in movies on campus), theCSI:D modeshouldbe used instead. Although it seems that the applicability ofCSI:T is limited, we note that the behavioral profile (in termsmobility) can sometimes be used to infer other social aspectsof the users, such as affiliations or even interests (e.g., peoplewho visit the gym often should like sports in general). Suchinferences expand the scenarios in whichCSI:T can be used.When this is not possible,CSI:Dissemination mode (CSI:D)provides a more generic option.

The major challenge involved in the design process isthat each node is only aware of the behavioral profile ofitself. Furthermore, we require no persistent control messageexchanges for the nodes to “learn” the structure of the networkproactively when they have no message to send. Nodes onlycompare their behavioral profileswhen they are involved inmessage dissemination. Based on this very limited knowledgeabout the behavioral space, a node must predict how useful agiven encounter opportunity is in terms of achieving the fore-mentioned requirements. Since encounter events may occursporadically in sparse, opportunistic networks, the nodesmustmake this decision for each encounter event independent ofother encounter events (that may occur long before or afterthe current one under consideration). Such a heuristic mustrely on the understanding of the relationship between nodalbehavioral profiles and encounters, which we discuss the next.

B. Relationship between Behavioral Profiles and Encounters

We now analyze the relationship between user behavioralprofiles and a key event for user-to-user communication in aninfrastructure-less network –encounters. Encountersin mobilenetworks refer to events when users are within the radio rangeof each other and direct communication between the involveddevices is possible. In this paper, based on the WLAN traces,we assume that when two users visit thesame locationduringoverlapped time intervals, theyencounterwith each other.

6

�

��

��

��

��

��

��

��

� � ��

86&

'DUWP RX WK

0 RELOLW\ �V LP LODULW\

7RWDO�HQFRXQWHU�GXUDWLRQ��PLQXWHV�

(a) Total encounter duration.0 RELOLW\ �V LP LOD ULW\

3UREDELOLW\�RI�HQFRXQWHU

�

� ��

� ��

� ��

� ��

�

� ��

86&

' DUWPRXWK

(b) Encounter probability.0 RELOLW\ �V LP LOD ULW\

6LPLODULW\�RI�HQFRXQWHUHG�QRGH�VHWV

�

� ��

� ��

� ��

� ��

� ��

� ��

� ��

86&

' DUWPRXWK

(c) Similarity of encountered node sets.

Fig. 5. Relationship between the similarity in behavioral pattern and other quantities.

While it seems intuitive that users visiting similar locationsshould encounter with each other with higher probability, thisis not obviouson university campuses. Students and facultyhave their own schedules, and they may rarely encounter dueto the difference in their schedules although they might be inthe same building at different times. Hence we investigate therelationship between behavioral profiles and encounter events,first as a sanity check of our intuition, and more importantly,to understand the relationship between the behavioral patternsand various aspects of the encounter events (e.g., the encounterprobabilities, encounter durations, etc.). This helps reveal theimplicit structureexisting in mobile human networks, whichis the key to the design of theCSI schemes in the followingsections.

We classify all node pairs into different bins of behavioralsimilarity metric (as defined in Eq. (2)), and obtain variouscharacteristics of encounter events as a function of the pair-wise behavioral similarity. In Fig. 5 (a), we show the aggregateencounter time duration between an average pair of nodesgiven the behavioral similarity. In Fig. 5 (b), we show theprobability for a given node pair to encounter with each other,given their similarity. Combining these two graphs, we seethat if two users are similar in behavioral profiles, theyare much more likely to encounter, and the total time theyencounter with each other is much longer – an indicationthat nodes with similar behavioral profilesindeed are morelikely to have better opportunities to communicate.Whentwo users are similar enough (with behavioral similarity largerthan 0.3), they are almost guaranteed to encounter at somepoint (with probability above0.9). However, we note thatsome “random” encounter events happen between dissimilarusers. For users with very low (almost zero) similarity, theprobability for them to encounter is not zero, although suchencounter events are much less reliable (i.e., they occur withmuch shorter durations, see Fig. 5 (a)).

In Fig. 5 (c) we further compare the behavioral similarityof nodeA andB versus the sets of nodesA andB encounter.We denote the set of nodesA encounters with asE(A).The similarity of the two sets of nodes is quantified by|E(A) ∩ E(B)|/|E(A) ∪ E(B)|, where| · | is the cardinalityof the set. This graph shows,as two nodes are increasinglysimilar, there is larger intersection of nodes they encounter.When an unlikely encounter event betweendissimilar nodesoccurs, it helps both nodes to gain access to a very differentset of nodes, which they are unlikely to encounter directly.

The above findings relate to the SmallWorld encounter

patterns between mobile users [14]. The key features ofSmallWorld networks [7] are high clustering coefficient andlow average path length. In the human networks we analyzein this section, people with similar behavior form “cliques”.The “random” encounter events between dissimilar nodesbuild short-cutsbetween these cliques to shorten the distancesbetween any two nodes. We leverage these properties in theprotocol design.

C. CSI:Target Mode

In the CSI:target mode (CSI:T), the sender specifies thetarget profile (TP)for the recipients which must have the sameformat and semantics as that of the user behavioral profile,i.e., in our case theTP is a summarizedmobility preferencevector (i.e., the percentage of times the target node(s) visitvarious locations). For example, we could reach people wholike sports by sending messages to those who visit the gymregularly. This criteria could be set up by specifying theTPas a vector with only one1 corresponding to the gym location(hence only time spent at this location is considered). If a givenuserA hasSim(BP (A), TP ) > thsim, i.e., its behavioralprofile,BP (A), is more similar toTP than a sender specifiedthreshold, we say nodeA belongs to the group ofintendedreceivers. This threshold is set by the sender according tothe desired degree of similarity to theTP . The TP andthe threshold,thsim, are included in the message header todescribe the intended receivers of the message.

We first discuss the intuition behind the design of theCSI:Tmodeusing Fig. 6 as an illustration. As per section VI-B, todeliver messages to receivers defined by a givenTP, one way isto gradually move the message towards nodes with increasingsimilarity to the TP via encounters, in the hope that suchtransmissions will improve the probability of encountering theintended receivers. Finally, when the message reaches a nodecloseto theTP (in the behavioral space), most nodes encounterfrequently with this node are also similar toTP. Hence, themessage should be spread to other nodes in theneighborhood(in the behavioral space) of the node.

Consider the pseudo-code in Algorithm 1. There are twophases in the operation, thegradient ascend phaseand thegroup spread phase. (1) Starting from the sender, if nodeAcurrently holding the message is not an intended receiver (i.e.,Sim(BP (A), TP ) < thsim), it works in thegradient ascendphase, otherwise it works in thegroup spread phase. (2) Inthe gradient ascend phase, for each encountered node, thecurrent message holder asks the behavioral profile of the other

7

6

7�3�7�3�

��*UDGLHQW�DVFHQG�

$�PHVVDJH�LV�VHQW�WR

QRGHV�ZLWK�LQFUHDVLQJ�

VLPLODULWLHV�WR�73�

��*URXS�VSUHDG��

6WDUWLQJ�IURP�WKH�ILUVW�

QRGH�ZLWK�VLPLODULW\�!�

WKVLP

��DOO�QRGHV�ZLWKLQ�

WKH�WKVLP�QHLJKERUKRRG�

UHFHLYH�FRSLHV�RI�WKH�

PHVVDJH�

6LP�%3�6��73��

6LP�%3�%��73��

6LP�%3�$��73��

$

%

6LP�%3�&��73��

&

Fig. 6. Illustration of the CSI:T scheme in thehigh dimension behavioralspace. One copy of the message follows increasing similarity gradient to reachthe neighborhood of the target profile, then triggers group spread.

node, and if the other node is more similar to theTP in thebehavioral space, the responsibility of forwarding the messageis passed to this node. One can imagine that these similaritiesform an inherentgradientfor the message to follow and reachthe close neighborhood of theTP in the behavioral space,hence the namegradient ascend phase. Note that, up to thispoint, there is only one copy of the message in the network –these intermediate nodes who are not similar to theTP onlyforward the message once. (3) When the message reaches anode with similarity larger thanthsim to the TP, the groupspread phasestarts. This intended receiver holds on to themessage, and requests the behavioral profiles from nodes itencounters. If they are also intended receivers, copies of themessages will be delivered to them. All intended receivers,after getting the message, continue to work in thegroupspread phase. Although multiple copies of the message aregenerated in thegroup spread phase, it is triggered only whenthe message is close to theTP, thus most of the encounterevents and inquiries will occur among theintended receivers,reducing unnecessary overhead.

/* BP (A): Behavioral profileof nodeA */if nodeA has the messagethen

if Sim(BP (A), TP ) > thsim thenInitiate Group spread();

elseInitiate Gradient ascend();

Gradient ascend(){while the message is not sentdo

foreach nodeE encountereddoGetBP (E) from E;if Sim(BP (E), TP ) > Sim(BP (A), TP ) then

Send message toE;

}Group spread(){foreach nodeE encountereddo

GetBP (E) from E;if Sim(BP (E), TP ) > thsim then

Send message toE;

}Algorithm 1 : Algorithm for the CSI:T mode

66 66

7KH�³LQWHUHVW�VSDFH´ 7KH�³EHKDYLRUDO�VSDFH´

Fig. 7. Illustrations of theCSI:D scheme. Left chart: The goal is to send amessage to a group of nodes with a similar characteristic in the interest space(white nodes in the circle). Right chart: However, they may not be similarto each other in the behavioral space (nodes with the same legend representsimilar nodes in the behavioral space).

D. CSI: Dissemination Mode

In theCSI:Dissemination mode (CSI:D), there does not exista direct relationship between the target profiles of the recip-ients and their measured behavioral profiles. One particularexample is to reach people who like movies on campus. Ifthere is no movie theaters on campus, the measured behavioralprofiles (i.e., mobility preference) cannot be used to infersuchan interest. This situation is illustrated in Fig. 7. It appearsthere is little insight provided by the similarities between thenodal behavioral profiles to guide message propagation, asthe intended receivers in this case may be scattered in thebehavioral space, and the relationship between the target pro-file and the behavioral profile cannot be quantified. Althoughit is always possible to reach most users through epidemicrouting, this leads to high overhead, and requires all nodesin the network to keep a copy of the message. The objectiveof CSI:D modeis to reduce the numbers of message copiestransmitted and stored in the network, yet make it possiblefor most nodes to get a copy quickly, if they belong to theintended receivers.

We again first discuss the intuition behind the design of theCSI:D modein this paragraph, using Fig. 8 as an illustration.From section VI-B,since the nodes with high similarityin their behavioral profiles are almost guaranteed toencounter, there is really no need for each of them tokeep a copy and disseminate the message. Electing a fewmessage holderswithin a single group of similar nodeswould suffice. This intuition leads to the construction ofour message dissemination strategy for theCSI:D. We aimto have only onemessage holderamong the nodes who aresimilar in their behavioral profiles (or equivalently, pickonlyonemessage holderwithin a neighborhoodin the behavioralspace. In Fig. 7, this corresponds to having only one messageholder from each group of nodes with the same legend). Weadd the messages holders carefully to avoid overlaps in theencountered nodes among message holders. As suggested byFig. 5 (c), we shouldselect nodes that are verydissimilar intheir behavioral profiles to achieve low overlaps.Recall thatdissimilar node pairs still encounter with non-zero probability,our design philosophy is to leverage these “random” encounterevents asshort-cutsto navigate through the behavioral spaceefficiently, hopping across the space to reach dissimilar nodeswith relatively few message transmissions. Such a designphilosophy is also related to the SmallWorld human networkstructure – a message will be received by an intended receivershortly once it has reached someone in the receiver’s “clique”.

8

6

��(DFK�PHVVDJH�KROGHU

VHQGV�WR�GLVVLPLODU�QRGHV�

WR�DOO�NQRZQ�KROGHUV�

�ZLWK�VLPLODULW\�ORZHU�

WKDQ�WKIZG��

$

%

��(DFK�PHVVDJH�KROGHU�

SUHYHQWV�RWKHU�QRGHV�ZLWKLQ

WKQEU VLPLODULW\�IURP

EHFRPLQJ�DQRWKHU�KROGHU�

6LP�%3�6��%3�$��WKIZG

6LP�%3�$��%3�%��WKIZG

6LP�%3�6��%3�%��WKIZG6LP�%3�$��%3�%��WKIZG

6LP�%3�6��%3�%��WKIZG

Fig. 8. Illustration of the CSI:D scheme. The idea is to select the messageholders in a non-overlapping fashion to cover the entire behavioral space.

Consider the pseudo-code in Algorithm 1. (1) The senderitself starts as the first message holder in the network. (2) Eachmessage holder tries to strategically add additional messageholders in the network. When it encounters with other nodes,it asks for the behavioral profile of the other node to beconsidered as a potential additional message holder. Eachmessage holder keeps a list of the behavioral profiles ofall known message holders3, and the new node has to bedissimilar (with the similarity metric lower than a threshold,thfwd) to all known holders to be added as a new messageholder and keep another full copy of the message. (3) If, onthe other hand, this node is similar to the message holder(i.e., within similarity thresholdthnbr), it uses a single bit toremember that there is a message holder in its neighborhoodand propagates this information to similar nodes. This bitis used to prevent excessive message holders in the sameneighborhood, even if some nodes have not encountered withthe message holders directly. (4) When holders encounter, theyupdate each other with the behavioral profiles of the knownholders list, to gain a better view of the situation of messagespreading. (5) If two similar holders encounter, one of themshould cease to be a holder to reduce duplicated efforts.

Each message holder is responsible for disseminating theactual message to the intended receivers. The message holderssends theTP specified by the sender in the message to theencountered nodes. If the encountered node is an intendedreceiver, the full message will be transferred.

VII. S IMULATION RESULTS

In this section, we perform extensive simulations with theCSI schemes, based on the derived encounters between usersfrom the two empirical traces. We compare the performancesof our proposal to oracle-based forwarding decisions to showthat our performance is close to the optimum (in terms of thedelivery success rate and the overhead), and does not fall muchbehind in delay. We also compare CSI to epidemic routing [5]and variants of random walk4. In all the simulation cases, wesplit the traces into two halves, use the first half to obtain thebehavioral profiles for all users, and then use the second halfof the trace to evaluate the success of our proposed schemes.

3Note this list does not necessarily contain all holders in the network.Message holders that are added by a particular message holder are not knownto other holders until they meet and sync the lists.

4The CSI could not be directly compared with existing routingschemes(e.g., [17], [3], [6], [10]) in DTN as most of them have a different routingobjective: reaching a particular network ID.

/* BP (A): Behavioral profileof node A *//* Hi(A): The i-th known holder of node A *//* holder in group(A): If A knows there is a

message holder in its neighborhood */if nodeA is a message holderthen

foreach nodeE encountereddoGetBP (E);if E is not a holderthen

if Sim(BP (E), BP (Hi(A))) < thfwd∀i andholder in group(E) = false then

ElectE as an holder;Add BP (E) to holder list;Send the message;SendBP (Hi(A)), ∀i;

else ifSim(BP (E), BP (Hi(A))) > thnbr

for any i thenLet E setholder in group(E) = true;

elseif Sim(BP (E), BP (A)) > thnbr then

A ceases to be a holder;else

Sync holder lists between nodeA andE;

else if holder in group(A) = true thenforeach nodeE encountereddo

GetBP (E);if Sim(BP (A), BP (E)) > thnbr then

Let E setholder in group(E) = true;

Algorithm 2 : Algorithm for CSI:D mode.

A. CSI:Target Mode

1) Simulation Setup:In the scenario of CSI:T mode, thesender specifies theTP and a threshold of similaritythsim. Ifa node shows a similarity metric higher thanthsim to theTP,it is an intended receiver. In our evaluation, we use the top-10 dominant behavioral profile5 (i.e., the behavioral profileswith the most number of people following it, typically in theorder of hundreds) in our traces as theTP, and for eachTP werandomly pick100 users as the senders generating messagestargeting at theTP. We use the thresholdthsim = 0.8 as thetransition point between thegradient ascend phaseand thegroup spread phase.

We compare ourCSI:Tscheme with several other protocolsdiscussed below. Theepidemic routing [5] is a messagedissemination scheme with simplistic decision rules: all nodesin the network send copies of messages to all the encounterednodes who have not received the message yet. Therandomwalk (RW)protocol generates several copies of the messagefrom the sender, and each copy is transferred among the nodesin a random fashion, until the hop count reaches a pre-setTTL value. Group spread onlyis a simplified version ofour protocol. It uses only thegroup spread phase, i.e., theoriginal sender holds on to the message until it encounters

5We have also experimented with other target profiles, such asrarelyvisited locations on campuses or profiles that contain a combination of severallocations, and the results are similar to those presented inthis section.

9

with someone who is more similar thanthsim to theTP andstarts thegroup spread phasedirectly from there.

We also consider two protocols that require global knowl-edge of the future. Theoptimal protocol sends copies of themessage only to the nodes which lead to the fastest delivery tothe targeted receivers, and no one else. This is the oracle-basedoptimal protocol achievable if one has perfect knowledge ofthe future, and serves as the upper bound for performance. Theoptimal single-forwarding-pathis the oracle-based protocol tofind the fastest path to deliver the message to the neighborhoodof theTP – Using the knowledge of the future, it identifies thepath that leads to the earliest message delivery to one of theintended receivers. Once a copy of the message is delivered tothe thsim-neighborhood to theTP, it follows the samegroupspread phaseas in CSI:T. This is the optimal performance(upper bound) for the family of protocols delivering one copyof message to the neighborhood of the target profile, if onechooses a good (shortest delay) path – note that this shortest-delay path may not always follow an increasing gradient ofsimilarities to theTP.

We compare these message dissemination schemes withrespect to three important performance metrics:delivery ratio,average delay, andtransmission overhead. Thedelivery ratiois defined as the percentage of the intended receivers (thosewith similarity greater thanthsim to theTP ) actually receivedthe message. We account for the transmission overhead asthetotal number of messages sentin the process of delivery. Seemore discussions on the additional overhead of exchanging thebehavioral profiles later in section VIII-A.

2) Simulation Results:We show the normalized perfor-mance metrics with respect to that ofepidemic routing(therelative performance for each protocol assumingepidemicrouting is 1.0) and its95% confidence intervals in Fig. 9. Weobserve thatepidemic routingleads to the highest overheadwhile its aggressiveness also results in the highest possibledelivery ratio and the lowest possible delay. Therandom walksdo not work well regardless the number of copies and the valueof TTL, as they use no information to guide the propagationof the message towards the right direction. OurCSI:Tprotocolleads to a success rate close to theepidemic routing(0.96 forUSC,0.94 for Dartmouth) with very small overhead (0.02 forUSC,0.018 for Dartmouth). For the simplified version,groupspread only, the delay is longer and the success rate is lowerthan our protocol. We will further investigate this phenomenonlater.

When comparingCSI:T with the protocols with futureknowledge, we see that there is really not much room forimprovement in terms of the success rate and the overhead.Our gradient ascend approach inCSI:T is similar to what isachievable even one has the knowledge of the future in thesetwo aspects. Specifically,CSI:Thas more than94% of deliveryrate and usesless than84% overhead of theoptimal strategy.The delay, on the other hand, has some room for improvement.Our gradient ascend phase generates only one copy of messagefrom the sender and it moves towards theTP following strictlyascending similarity. Comparing with the best (fastest) path totheTP used in theoptimal single-forwarding-path, our CSI:Thas1.40 and1.47 times more delay, for USC and Dartmouth,

� ��

( S LG HP LF �UR X WLQ J

& 6 ,�7

* URXS �VS UHDG �R Q O\

2 S WLP D O

237,0 $/ �� S D WK

5: �77/ �� FR S \ �

5: �77/ �� FR S \ �

5: �77/ �� FR S \ ��

' HOLYH U\ �UD WLR ' H OD\ 2YHUK HDG

(a) USC.

� � � � � � � �

( S LG HP LF�UR X WLQ J

& 6 ,�7

* URXS �VS UHDG �R Q O\

2 S WLP D O

2 S WLP DO�� S D WK

5: �77/ �� FR S \ �

5: �77/ � �FR S \ ��

' HOLYHU\ �UD WLR ' H OD\ 2YHUK HDG

(b) Dartmouth.

Fig. 9. Performance comparison of CSI:T to other protocols.

respectively. If we compare with theoptimal strategy, wheremultiple copies are generated whenever it helps to improvethe delay, the difference is even larger. This calls for a furtherinvestigation of selecting good path(s) from the sender to theTP , which we leave out for future work.

We take a closer look at the performance metrics by splittingthe simulation cases into categories, depending on the originalsimilarity metric between the sender’s behavioral profile andthe TP, Sim(BP (S), TP ). By the split statistics shown inFig. 10, we see why thegradient ascend phaseis neededto improve the success rate and reduce the delay. When weuse only thegroup spread phase, and the sender is dissimilarfrom the TP, it takes a longer time before any encounterevent happens directly between the sender and anyone in theneighborhood of theTP, if it happens at all – hence the delayis longer, and the success rate is lower.

Comparing the differences between two versions of randomwalks, few long threads and many short threads, reveals aninteresting difference. The concept that leads to the differenceis illustrated in Fig. 11. Many short threads are better if thesender is close to theTP, in terms of both delivery ratio anddelay, as the sender generates a lot of threads to “occupy”the neighborhood – since the threads are short, and similarusers encounter more frequently, they are likely to stay in theneighborhood. Contrarily, if the sender is far away from theTP, long random walk threads provide a legitimate chance ofmoving close to theTP, while short threads provide less hope.

B. CSI:Dissemination Mode

1) Simulation Setup:In the scenario ofCSI:D mode, thetarget profile specified by the sender cannot help to determine

10

'HOLYHU\�UDWLR��

�

� ��

� ��

� ��

� ��

�

� ��

( S LG HP LF

URX WLQ J

* UR XS �VS UHDG

R Q O\

& 6 ,�7 )HZ �OR Q J�5: 0 DQ\�VK R UW

5:

VLP�� VLP�� VLP�� VLP�� VLP

(a) Delivery ratio.

'HOD\��PLQXWHV�

�

��

��

��

��

��

��

��

��

��

��

( S LG HP LF

UR X WLQ J

* UR XS �VS UHDG

RQ O\

&6 ,�7 )HZ �OR Q J

5:

0 DQ\�VK R UW

5:

VLP �� VLP �� VLP �� VLP �� VLP

(b) Average delay.

Fig. 10. Split performance metrics by the similarity between the sender andthe target profile (USC).

6

7�3�

66

7�3�

6LQJOH�ORQJ�5:

6

7�3�

66

7�3�

0XOWLSOH�VKRUW�5:

66

6LQJOH�ORQJ�5:

66

0XOWLSOH�VKRUW�5:

6HQGHU�LV�VLPLODU�WR�73 6HQGHU�LV�GLVVLPLODU�IURP�73

Fig. 11. Illustrations for the comparison between one long random walk andmany short random walks.

to where the message should be sent in the behavioral space.Hence, the strategy seeks to keep one copy in every neigh-borhood in the behavioral space. In our evaluation, we startfrom 1000 randomly selected users as the senders. Since thetarget profile of the intended receivers can be orthogonal tothe behavioral profile, we create the scenario for evaluationby randomly selecting500 nodes as the intended receiversfor each sender, and consider the average performances. Wevary the two thresholds,thfwd andthnbr in our CSI:D modescheme proposed in VI-D, to adjust the aggressiveness of theforwarding scheme. Setting low values for both thresholdsleads to less aggressive operations and inferior performances.At the same time is also leads to lower overheads, as the mes-sages are copied to fewer message holders, and the existenceof a message holder prevents nodes in a larger neighborhoodfrom becoming another message holder.

We compare various parameter settings of ourCSI:D modewith two baseline protocols, theepidemic routingand therandom walk. The epidemic routing works the same way asbefore, serving as the baseline for comparison. In the randomwalks, the visited nodes along the walks become messageholders and they will later disseminate the messages furtherwhen encountering with the intended receivers. Theoptimalprotocol again assumes global view of the network and theknowledge of the future. Every node in the network knows

who the intended receivers are, and sends the messages toother nodes only if they lead to the fastest delivery to themessage to one of the receivers.

The performance metrics we consider aredelivery ratio, av-erage delay, transmission overhead, and, in addition,storageoverhead. Here thetransmission overheadrefers to the totalnumber of transmissions to reach the message holders andthe intended receivers. Thestorage overheadis the numberof eventual message holders that remains in the network afterour scheme is stabilized (recall that some message holdersmay decide to cease performing the task if another messageholder is found with similar behavioral pattern inCSI:D). Thisis the overall amount of storage space invested by the nodescollectively to deliver the message6. In the epidemic routingand theoptimal protocol, all nodes that receive the messagehold on to the message for future transmissions (there is nodistinction between the message holder and a regular node),hence the transmission overhead and the storage overhead arethe same.

2) Simulation Results:In Fig. 12 we show the averageresult of the1000 simulation cases with the95% confidenceinterval. We use the legend CSI:D-thfwd-thnbr for our CSI:Dscheme. Comparing with theepidemic routing, our protocolsaves a lot of transmission and storage overhead. It is possibleto use only about7.2% strategically chosen nodes as themessage holder and reach the intended receivers with littleextra delay (about32% more), when thfwd = 0.3 andthnbr = 0.7. Notice that the storage overhead of theCSI:Dscheme is even lower than theoptimalprotocol (less than60%)with the objective of minimizing the delay. If one desiresfurther reduction in the overhead, setting lower thresholdvalues provide a way to trade performance for overhead, e.g.,settingthfwd = 0.1 andthnbr = 0.6 cuts the storage overheadto about3% of the epidemic routing. The delay of theCSI:Dis not much more than theepidemic routingor theoptimal, ataround27% to 32% more whenthfwd = 0.3 andthnbr = 0.7.

For therandom walks, we have configured theTTL valuesfor them to have similar overhead with theCSI:D (i.e.,compare RW TTL=350 with CSI:D-0.7-0.3 and RW TTL=150with CSI:D-0.6-0.1). We notice that although the deliveryrate of therandom walkis also pretty good (1.5% to 10%inferior to the correspondingCSI:D), thanks to the non-zeroencounter probability between dissimilar nodes, its delayismuch longer than the correspondingCSI:D (between50%to 108% more). This is because therandom walkdoes notleverage the implicit structure of the human network to selectthe message holders wisely, as theCSI:D does. Therandomwalk leaves copies within the same neighborhood of theoriginal sender with higher probability, as similar nodes aremore likely to encounter (i.e., therandom walkwill not “leavethe neighborhood” in a small number of hops). Hence, thereexists significant overlap between the nodes encountered bythe selected message holders, and the other nodes that aredissimilar to these holders have to wait for a long time before

6Typically, only about a couple dozens of message holders drop the messagein the simulation cases. Even if we have accounted for the temporarily investedstorage, it adds less than1% additional storage overhead.

11

� � ��

( S LGHP LF �URXWLQJ

&6 ,�' ��

&6 ,�' ��

&6 ,�' ��

2 SWLPDO

5: �77/ ��

5: �77/ ��

' HOLYHU\�UD WLR' HOD\6 WR UDJH�RYHUKHDG7[�RYHUKHDG

(a) USC.

� � � � �

( S LGHP LF �URXWLQJ

&6 ,�' ��

&6 ,�' ��

&6 ,�' ��

2 S WLPDO

5: �77/ ��

5: �77/ ��

' HOLYHU\�UD WLR' HOD\6 WR UDJH �RYHUKHDG7[�RYHUKHDG

(b) Dartmouth.

Fig. 12. Performance comparison of CSI:D to other protocols.

some “random” encounter events occur to receive the message,resulting in the longer delay.

VIII. D ISCUSSIONS

A. Additional Overhead

In addition to the message transmission and storage, in ourproposed CSI schemes, due to the need for exchanging andmaintaining the behavioral profiles, there are some additionaloverhead. We discuss them in details in this section.Overhead for exchanging the behavioral profilesWe iden-tify some additional components to the actual message trans-missions when the encounter events between mobile nodes areleveraged for message dissemination. Some of the componentsare common toany message dissemination schemes, and theothers are unique to our CSI schemes.

• The common overhead for all the DTN message dissem-ination schemes considered include the beacon signalsfor nodes to discover each other when they encounter,and the exchange of a list of “messages I have seen” toavoid a given node receiving duplicated messages fromdifferent nodes. This type of overhead is a function of theencounter patterns itself and is independent of the actualprotocol used. We ignore these common factors in ouranalysis.

• Exchanging the behavioral profiles for the evaluationof mutual similarity is an additional component thatexists only in our behavior-aware protocol. These profilesare a handful of vectors associated with its weights.For most of the users, empirically, five to seven eigen-behavior vectors capture more than90% of the power intheir association matrices[4]. This is a small constant

overhead we pay for each encounter when one of thenodes has some message to send. If the message sizeis much larger than the overhead, which is usually thecase as messages are transferred in a bigger unit (i.e., a“bundle”) in DTNs, it is worthwhile to pay this overheadto gain the reduction of transmission counts as we see insection VII. Furthermore, with CSI, if there is no messageto send, there is no need to exchange the behavioralprofile. Thus, comparing with the protocols that requireproactive, persistent exchanges of control messages whennodes encounter (e.g., ProPHET [17] requires the ex-change of encounter probability vectors), qualitatively,the CSI schemes have lower overhead, especially whenthe volume of traffic is low in the network.

• The actual message size has to be augmented with theTP as well. This is a constant overhead, and it can bereduced if the target vector is “sparse” (e.g., if theTPconsiders only the visits to the gym exclusively, thereis only one1 in the vector. Instead of adding a vector(0, ..., 0, 1, 0, ....) in the header, the vector can be encoded(i.e., by specifying (gym, 1)) to save space.).

• In the CSI:D mode, the message holders have to exchangethe list of behavioral profiles of known holders. Thishappens only between a small subset (less than8%) ofthe nodes, and the exchange is necessary only when thereis a difference in the lists. To further alleviate this, thetwo nodes can compare their known holder lists using ahash value, and exchange only the difference.

Overhead for maintaining the behavioral profiles In orderto maintain the behavioral profile, the nodes have to keeptrack of its visiting time to various locations. Note this doesnot require a node be aware of all possible locations in theenvironment – it has to keep track of only the ones it hasbeen to. When two nodes exchange the behavioral profiles,each entry in the behavioral profile contains only a subset oflocations with annotations for these locations (e.g., NodeAspecifies (library, gym) = (0.8, 0.2) while nodeB specifies (li-brary, computer lab) = (0.4, 0.6)). The nodes will take a unionof the location sets when comparing their similarities (e.g.,in the previous example, when nodeA sends the behavioralprofile to B, B will convert the profiles toBP (A): (library,gym, computer lab) = (0.8, 0.2, 0) andBP (B): (library, gym,computer lab) = (0.4, 0, 0.6) before comparing). The requiredstorage on each node is minimal, as we show about three tofive days of summarizedmobility preferenceis sufficient toestablish a stable behavioral profile for the user in sectionIV.

In addition, if the beacon signals from locations are notavailable, it is possible to use the mutual encounter vectors asthe behavioral descriptors for the nodes – nodes who movesimilarly should have similar encounter sets. In this sense, wecould replace the representation to be totally independentofthe infrastructure.

B. Privacy Issues

While the behavior-aware message dissemination schemesachieve good performance with significant overhead reduction,it also raises user privacy concerns. In some cases, individuals

12

may not want to reveal their own behavior. We discuss privacy-preserving options with our CSI scheme below.

First we emphasize that the original design of CSI presentedin section VI inherently possesses a privacy-preserving feature:we only use a small subset of user behavior (specifically, themobility preference) in the behavioral profile, and with thesingular value decomposition, we reveal only the summarizedtrend, not detailed location visiting events for the user. Inaddition, the behavioral profiles are exchanged only betweennodes, not stored in any public directory, and it limits onlytowhen a given node is involved in message dissemination.

We can further reduce the behavioral profile exchangesin the CSI scheme, and hence help to preserve privacy asfollows. For the CSI:T mode, when nodes encounter, insteadof exchanging their behavioral profile, the node with a messageto send would first send to the other node theTP of themessage and its similarity score to theTP. The other nodesilently calculates its similarity to theTP and decides whetherto request for the actual message. This completely removesthe need for behavioral profile exchanges in CSI:T mode.

For the CSI:D mode, when a message holder looks forpotential new holders, instead of asking other nodes to sendthe behavioral profile, the message holder sends the list ofknown holder’s behavioral profiles to the other node. Sincethis list contains only thebehavioral profilesof the knownholders, not theiridentities, dissemination of such lists inthe network does not pose a threat to the privacy of themessage holders. Furthermore, when there are multiple holdersin the list, the other node is not able to tell which behavioralprofile corresponds to the holder who sends out the list. If theother node decides to become a message holder, its behavioralprofile has to be added to the list of known holders. Instead ofimmediately sending the behavioral profile of the new holderto the old holder, which poses an opportunity for the oldholder to link the identity and the behavioral profile of thenew holder, the new holder only adds its behavioral profile toits own known holder list, and delays the dissemination for alater holder profile list exchange.

Finally, as a last resort, privacy-minded individuals canalways opt-out of the service, and we expect this would notimpact the performance severely, as it has been shown thatthe encounter pattern between nodes in mobile networks isrich enough to sustain up to40% of nodes opting out beforeobserving a performance degradation [14].

IX. CONCLUSION AND FUTURE WORK

In this paper, we propose a paradigm to represent, summa-rize and manipulate behavioral profiles and use such profilesas targets for the communication. We have presented a novelservice of message dissemination in infrastructure-less mobilehuman networks based on the behavioral profiles of theusers. The CSI schemes meet the design goals outlined insection VI-A with respect to efficiency, flexibility and privacypreserving properties. The CSI schemes perform closely tothe delay-optimal protocols (with94% or more success rate,less than83% of overhead, and the delay is inferior by40%or less). In addition, we also observe that human behavior as

observed in the large scale empirical traces is quite robustandonly a few days’ worth of data is adequate to summarize andleverage for message dissemination, which is quite surprising.

We are working toward an implementation of the CSIschemes based on mobile devices and consider a real-worldevaluation. One key issue is to adapt our algorithm in a moreprivacy-preserving fashion which is also resistant to spam(e.g., include a reputation system). We are also consideringdifferent applications of behavioral profiles, including targetedadvertising via our CSI schemes.

REFERENCES

[1] MobiLib: Community-wide Library of Mobility and Wireless NetworksMeasurements. http://nile.cise.ufl.edu/MobiLib/

[2] CRAWDAD: A Community Resource for Archiving Wireless Data AtDartmouth. http://crawdad.cs.dartmouth.edu

[3] J. Leguay, T. Friedman, and V. Conan, ”Evaluating Mobility Pattern SpaceRouting for DTNs,” in Proceedings of IEEE INFOCOM, April, 2006.

[4] W. Hsu, D. Dutta, and A. Helmy, ”Extended Abstract: Mining BehavioralGroups in Large Wireless LANs” In Proceedings of ACM MOBICOM,Sep. 2007.

[5] A. Vahdat and D. Becker, ”Epidemic Routing for PartiallyConnectedAd Hoc Networks,” Technical Report CS-200006, Duke University, April2000.

[6] E. Daly and M. Haahr, ”Social Network Analysis for Routing in Discon-nected Delay-Tolerant MANETs,” In Proceedings of ACM MOBIHOC,Sep. 2007.

[7] D. J. Watts and S. H. Strogatz. ”Collective Dynamics of ’Small-World’Networks,” Nature, vol. 393, pp. 440-442, 1998.

[8] P. Costa, C. Mascolo, M. Musolesi, and G. Picco, ”Socially-aware Rout-ing for Publish-Subscribe in Delay-tolerant Mobile Ad Hoc Networks,”to appear in IEEE Journal on Selected Area of Communications.

[9] A. Miklas, K. Gollu, K. Chan, S. Saroiu, K. Gummadi, and E.Lara,”Exploiting Social Interactions in Mobile Systems,” in Proceedings of9th International Conference on Ubiquitous Computing, Sep. 2007.

[10] M. Thomas, A. Gupta, and S. Keshav, ”Group Based RoutinginDisconnected Ad Hoc Networks”, in Proceedings of 13th Annual IEEEInternational Conference on High Performance Computing, Dec. 2006.

[11] W. Zhao, M. Ammar, and E. Zegura, ”A Message Ferrying Approachfor Data Delivery in Sparse Mobile Ad Hoc Networks,” in Proceedingsof ACM Mobihoc 2004, May 2004.

[12] W. Hsu and A. Helmy, MobiLib USC WLAN trace data set. Downloadedfrom http://nile.cise.ufl.edu/MobiLib/USCtrace/

[13] D. Kotz, T. Henderson and I. Abyzov, CRAWDAD data set dart-mouth/campus/ movement/0104 (v. 2005-03-08). Downloaded fromhttp://crawdad.cs.dartmouth.edu/dartmouth/campus/movement/0104

[14] W. Hsu and A. Helmy, ”On Nodal Encounter Patterns in WirelessLAN Traces,” the Second International Workshop On WirelessNetworkMeasurement (WiNMee 2006), April 2006.

[15] W. Hsu, D. Dutta, and A. Helmy, ”Profile-Cast: Behavior-Aware MobileNetworking,” in Proceedings of IEEE WCNC, Las Vegas, NV, Mar. 2008.

[16] N. Eagle and A. Pentland, ”Reality mining: sensing complex socialsystems,” in Journal of Personal and Ubiquitous Computing,vol.10, no.4, May 2006.

[17] A. Lindgren, A. Doria, and O.Schelen, ”Probabilistic Routing in Inter-mittently Connected Networks,” Lecture Notes in Computer Science, vol.3126, pp. 239-254, Sep. 2004.

[18] M. Motani, V. Srinivasan, and P. Nuggehalli, ”PeopleNet: EngineeringA Wireless Virtual Social Network.” in Proceedings of MOBICOM 2005,Sep. 2005.

[19] M. Kim and D. Kotz, ”Periodic properties of user mobility and access-point popularity,” Journal of Personal and Ubiquitous Computing, 11(6),Aug. 2007.

[20] J. Ghosh, M. J. Beal, H. Q. Ngo, and C. Qiao, ”On Profiling Mobilityand Predicting Locations of Wireless Users,” in Proceedings of ACMREALMAN, May 2006.

[21] R. Horn and C. Johnson, Matrix Analysis, Cambridge University Press,published 1990.

http://nile.cise.ufl.edu/MobiLib/

http://crawdad.cs.dartmouth.edu

http://nile.cise.ufl.edu/MobiLib/USC_trace/

http://crawdad.cs.dartmouth.edu/dartmouth/campus/movement/01_04

csi: a paradigm for behavior-oriented delivery services in … · 2008-07-08 · arxiv:0807.1153v1...

Documents