[ieee 2006 securecomm and workshops - baltimore, md, usa (2006.08.28-2006.09.1)] 2006 securecomm and...

10
Secure Real-time User Preference Collection for Broadcast Scheduling Xuhua Ding, Shuhong Wang,Baihua Zheng, School of Information Systems Singapore Management University Email: {xhdi'ng, shwang, bhzheng} @smu. edu. sg Abstract cast. Different from point-to-point connection which al- locates an exclusive channel for one client, broadcast Effcient broadcast scheduling is essential to the per- approach publishes data on a public channel shared formance of wireless data broadcast systems. Existing al- by all the clients. As a result, a simultaneous access gorithmsfor broadcast scheduling are mostly based on the is enabled. Due to its constant cost and high scalabil- knowledge of users' data access pattern. Unfortunately, ity, broadcast approach is the ideal dissemination ap- the requirement of exposing individual preference profile proach for information services with a huge number becomes a serious threat to user privacy. In this paper, of potential users. One typical application of wireless we investigate the issue of securely collecting user access data broadcast is the real-time stock quote update. A patterns in real-time for broadcast scheduling. We pro- broadcast server periodically broadcasts the up-to-date pose a novel secure user profile collection protocol which quotes of stocks via the wireless channel to guarantee protects the privacy of individual users yet facilitates ef- that each user can retrieve his desired information. ficient wireless data broadcast scheduling. To address the To retrieve data items, a user monitors (i.e., receive crucial tssue of power conservation tn mobtle devtces, and check) the broadcast channel until his desired data our scheme does not rely on expensive public key cryp- items arrive. The response time, i.e., the duration be- tography. Light computation and communication at the tween a user starting to listen to the channel and receiv- user end makes the scheme feasible for mobile devices all his intere 'ng al hns rnteuhsted data items, iS a common metric to with limited resource. Our theoretical security ahalysis evaluate the performance of a broadcast system. Broad- shows that the proposed protocol preserves user privacy cast performance is of paramount importance due to against eavesdroppers and malicious broadcast servers. the fact that most users are only interested in a tiny Moreover, our extensive performance evaluation experi- portion of a large data set. Seemingly, a flat broadcast ments show that the proposed scheme has low computa- approach whereby all the objects share the same prior- ti'oh ahd commrnuhi'cati'oh cost. ity in the wireless channel can produce the best aver- age performance for all users. Nonetheless, data items are not uniformly accessed in most, if not all, real ap- 1. Introduction plications. As pointed out by the Pareto's Principle, 20% of the data items can satisfy 80% of the require- The landscape for mobile computing is rapidly ments from the clients. changing. As wireless access becomes faster, cheaper, Therefore, scheduling algorithms are motivated to more reliable and ubiquitous, the wireless cover- achieve the near optimal response time by adjusting age is increasing rapidly. Information service providers the broadcast program according to the collective in- (ISPs) are vying for customers by delivering or en- terests of current clients. Various approaches have been hancing their data services wirelessly. In order to save proposed in the literature [2, 3, 13, 14, 15]. To the best operational cost and provide better services, most of our knowledge, all existing scheduling algorithms ISPs outsource their information services to a wire- are based on users' access preferences on data items. less telecommunication carrier. They either assume that the access patterns of users The specific wireless technologies in use vary from are known to the server, or require the users to explic- application to application. However, the underlying ap- itly upload their preference profiles. Both approaches proaches to delivering information in wireless appli- are at the cost of user privacy. On the other hand, cations are either point-to-point connection or broad- due to the widespread public awareness on privacy, the 1 -4244-0423-1/06/$20 .00 ©C2006 I EE E

Upload: baihua

Post on 25-Feb-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Secure Real-time User Preference Collection for Broadcast Scheduling

Xuhua Ding, Shuhong Wang,Baihua Zheng,School of Information Systems

Singapore Management UniversityEmail: {xhdi'ng, shwang, bhzheng} @smu. edu. sg

Abstract cast. Different from point-to-point connection which al-locates an exclusive channel for one client, broadcast

Effcient broadcast scheduling is essential to the per- approach publishes data on a public channel sharedformance of wireless data broadcast systems. Existing al- by all the clients. As a result, a simultaneous accessgorithmsfor broadcast scheduling are mostly based on the is enabled. Due to its constant cost and high scalabil-knowledge of users' data access pattern. Unfortunately, ity, broadcast approach is the ideal dissemination ap-the requirement of exposing individual preference profile proach for information services with a huge numberbecomes a serious threat to user privacy. In this paper, of potential users. One typical application of wirelesswe investigate the issue of securely collecting user access data broadcast is the real-time stock quote update. Apatterns in real-time for broadcast scheduling. We pro- broadcast server periodically broadcasts the up-to-datepose a novel secure user profile collection protocol which quotes of stocks via the wireless channel to guaranteeprotects the privacy of individual users yet facilitates ef- that each user can retrieve his desired information.ficient wireless data broadcast scheduling. To address the To retrieve data items, a user monitors (i.e., receivecrucial tssue of power conservation tn mobtle devtces, and check) the broadcast channel until his desired dataour scheme does not rely on expensive public key cryp- items arrive. The response time, i.e., the duration be-tography. Light computation and communication at the tween a user starting to listen to the channel and receiv-user end makes the scheme feasible for mobile devices all his intere'ng al hns rnteuhsted data items, iS a common metric towith limited resource. Our theoretical security ahalysis evaluate the performance of a broadcast system. Broad-shows that the proposed protocol preserves user privacy cast performance is of paramount importance due toagainst eavesdroppers and malicious broadcast servers. the fact that most users are only interested in a tinyMoreover, our extensive performance evaluation experi- portion of a large data set. Seemingly, a flat broadcastments show that the proposed scheme has low computa- approach whereby all the objects share the same prior-ti'oh ahd commrnuhi'cati'oh cost. ity in the wireless channel can produce the best aver-

age performance for all users. Nonetheless, data itemsare not uniformly accessed in most, if not all, real ap-

1. Introduction plications. As pointed out by the Pareto's Principle,20% of the data items can satisfy 80% of the require-

The landscape for mobile computing is rapidly ments from the clients.changing. As wireless access becomes faster, cheaper, Therefore, scheduling algorithms are motivated tomore reliable and ubiquitous, the wireless cover- achieve the near optimal response time by adjustingage is increasing rapidly. Information service providers the broadcast program according to the collective in-(ISPs) are vying for customers by delivering or en- terests of current clients. Various approaches have beenhancing their data services wirelessly. In order to save proposed in the literature [2, 3, 13, 14, 15]. To the bestoperational cost and provide better services, most of our knowledge, all existing scheduling algorithmsISPs outsource their information services to a wire- are based on users' access preferences on data items.less telecommunication carrier. They either assume that the access patterns of users

The specific wireless technologies in use vary from are known to the server, or require the users to explic-application to application. However, the underlying ap- itly upload their preference profiles. Both approachesproaches to delivering information in wireless appli- are at the cost of user privacy. On the other hand,cations are either point-to-point connection or broad- due to the widespread public awareness on privacy, the

1-4244-0423-1/06/$20.00 ©C2006 IEEE

concerns on personal privacy are growing. In our stock Related Workquote broadcast example, the subscribers may not wantto share with the broadcast server what stocks theyare interested in, especially when the server is man- Theoretically, the problem of secure user profile col-aged by an untrusted network carrier. The privacy con- lection can be solved by a secure e-voting scheme,cern dampens users's willingness to subscribe services whereby each user casts his votes on every data itemfrom ISPs. and broadcast server tallies all votes. However, we ar-We observe that an efficient wireless information ser- gue that it is infeasible in practice. Many electronic

vice with user privacy preservation is highly desirable, voting schemes have been proposed, e.g. [4, 6, 7]. Ba-as privacy usually tops customers' concern list when sically, there are two approaches. One approach is tothey sign up for services. Unfortunately, how to protect use mix-net, as in [1, 16, 19]. Nonetheless, it requires auser preference privacy in wireless broadcast schedul- number of additional mix servers between subscribersing has not been addressed in the literature. A naive and the broadcast server so that a significant changesolution is to introduce an online trusted third party has to be made on the underlying communication in-(TTP). Instead of informing the broadcast server ac- frastructure. The other approach, as used in [4, 6, 18],cess preferences, users send their information to TTP, is to employ homomorphic encryptions [10, 17] andwhich gathers all the preferences and sends the accu- threshold decryption [11]. Unfortunately, these schemesmulated result to the broadcast server. However, this incur a heavy computation load on the voters and re-approach has two fatal drawbacks. First, it is imprac- quire the collaboration among multiple servers, whichtical to require a large amount of heterogenous mobile is unrealistic for a broadcast system disseminating hun-users to trust a common entity. Second, as in all other dreds of data items to mobile devices.applications with online TTPs, the TTP naturally be-comes a single point of failure in terms of both perfor- The computation paradigm of our problem has sim-mance and security. A TTP is usually chosen as the tar- ilarity to data aggregation in wireless sensor networks,get of attacks by adversaries. Therefore, relying on an where the data collected by sensors are encrypted, ag-online TTP is not an appropriate solution. Another in- gregated and sent to a sink node. However, the exist-tuitive approach is to make use of so-called "anonymiz- ing schemes, e.g. [8, 22], protect data confidentialityers", like Tarzan[12] or Mix[9]. This approach requires only against eavesdroppers. The sink node has the col-additional servers or interactions among peer nodes. lection of keys and is able to decrypt individual cipher-Considering the broadcast servers are typically base text, which is undesirable in our scenario.stations of a cellular network, adding new servers will Recent work [231 by Yang et. al is very close to ours.significantly change the communication infrastructure. Their scheme allows a data miner to anonymously col-Therefore, an anonymous network is not a satisfying lect data from a group of users. The basic idea is toapproach either.

use re-encryption, which has been widely used in mixContribution In this paper, we propose a novel user networks. We argue that this approach is not a solu-preference collection scheme which allows a broad- tion to our problem since it requires t rounds of com-cast server to aggregate the information needed by a munication between the miner and a set of t users.scheduling algorithm, while an individual user's pref- The incurred communication cost is too high for real-erences are not exposed. Specifically, the main contri- time broadcast scheduling. Moreover, re-encryption in-bution of this paper is three-fold. curs multiple modular exponentiations. Considering

the number of data items involved and the limited com-

* Our study is the first addressing user preference putation resource in mobile devices, the computationprivacy in the wireless scheduling settings. cost is prohibitively high.

* A secure and practical user preference collectionscheme is proposed with low computation and Organization The rest of the paper is organized as fol-communication cost. Neither additional server nor lows. In Section 2, the problem is formally defined withpeer interactions are required in our solution. all the challenges we are facing. The detailed secure

user profile collection approach is described in Sec-* A detailed analysis on the security, communication tion 3. In Section 4 and Section 5, we analyze the se-

cost and computation cost of our scheme is pro- curity and performance of the proposed approach re-vided. The analysis is verified and complemented spectively. Finally, we conclude this paper in Section 6by our experiment results. and point out the future work.

2. Problem Formulation is able to sniff all traffic. In addition, BS is able to ob-tain the subscriber's information, e.g their identities,

In this section, we provide a detailed description of locations and SIM card numbers. Therefore, no com-the problem settings. munication source anonymity is provided in existing

System Model Throughout this paper, we consider a networking infrastructure.wireless data broadcast system consisting of three types Objectives A secure user profile collection has two ob-of entities: jectives. First, BS is able to compute the sum of all

* Service Registration Server(SRS): SRS locates at subscribers' preferences on every data item, which isthe ISP's site and takes the charge of system ini- required by the scheduling algorithm. Second, the pro-tialization and service subscription. For initializing tocol should preserve every subscriber's preference pri-a system, SRS determines the set of data items to vacy. In other words, a user's irndividual preferencesbroadcast, based on the specific application. The should not be exposed to either BS or other end-users.size of the data set is application-dependent, vary- It implies that an adversary is unable to determine, ex-ing from a few hundred to a few thousand. It is cept by random guessing, whether a data item is pre-also responsible for the selection of the schedul- ferred by a user.ing algorithm, used by BS in the broadcast pro- CAVEAT User preference privacy is different from usergram, and an integer range for preference specifi- data privacy. The latter prevents the adversary fromcation, e.g. [0,1,. ... 10]. A larger integer indicates a knowing what data a user obtains from the communi-stronger interest. SRS passes the system parame- cation channel. We remark that the nature of broad-ters to a wireless carrier, whose base stations deal cast obviously offers user data privacy.with the actual data delivery. For service subscrip-tion, it acknowledges the request from a new user Requiremets We list below the requirements of a pro-via passing the user a set of parameters. file collection protocol.

* Broadcast Server(BS): A BS is a base station op- * computation cost. Most mobile devices, e.g. cel-erated by a wireless carrier. It periodically delivers lular phones, are powered by batteries and havethe data set to subscribers in its domain. Before very limited computation capacity. A secure pref-scheduling a broadcast, BS requests all its receivers erence aggregation scheme should not levy the de-to upload their preferences. With subscribers' col- vices's resource by imposing heavy computations.lective preferences, BS sorts out the data set and This requirement implies that many cryptographicassigns priorities to each item. Then, it broadcasts techniques based on large number operations arethe data set accordingly. The issues of scheduling not suitable for common mobile devices.algorithm and data delivery are out of the scopeof thsapr * communication cost. Note that communica-

tions also consume CPU cycles and battery power.* Subscribers: A subscriber is an end-user subscrib- Moreover, in many wireless networks, a device's

ing the broadcast service1. Registration at the SRS upload channel has narrower bandwidth comparedis required for a subscriber to receive data broad- to its download channel. The volume of trafficcast from BS. Each subscriber independently de- and the number of rounds of communications aretermines his own user interest profile, i.e., a list tightly constrained. Out of this concern, interac-of desired items with corresponding preferences in tions among subscribers should be avoided. Fur-the range defined by SRS. All preferences are in- thermore, it is undesirable to introduce extra en-dependent of each other. tities due to both cost and policy reasons. A prac-

Trust Model We assume that SRS is fully trusted tical preference collection scheme should be builtwhereas BS is not. BS may take advantage of its par- on top of the existing network infrastructure.ticipation in the scheme and attempt to compromise * accuracy. The preference aggregation protocolend-users' privacy. The end-users are assumed to be should not degrade the performance of the exist-honest. They do not collude with malicious BS. We as- ing broadcast schedule algorithm. A broadcast ser-sume the channel between a user and BS is authen- vice may cover hundreds of data items. Ideally, thetic and reliable. However, we do not assume the con- protocol is able to count every user's preference onfidentiality of the broadcast channel. An eavesdropper every data item.

1 Without causing ambiguity, we use subscribers and users al- Table 1 summarizes the notations used in the fol-ternatively in the rest of this paper. lowing description.

an option, a user may download y, Y2, ..., yN-l fromUsjDescrwipthioentityxSRS in order to save computation cost in future proto-

Ux ~ User with identity x C Z+ col executions.wx ~ User Ux's secret keyD the set of data items to broadcastN the cardinality of D, i.e. the number of 3.2. The Protocol

items to broadcastDi the i-th data item, i C [1, N] The preference aggregation protocol consists of threeN. the number of users in BS's domain steps.[0, 1,... s] the integer range of a preference on a data____,__,_ item 1. Advertising: BS divides all users in its domain into

t the threshold group size, i.e. the number of groups of size t. It broadcasts to all users theirusers in each group membership together with the next time slot for

Gi the i-th threshold group of t users preference uploading.ml the bit length of an integer m1KIl the

a one-wa collision-egresis keyed h 2. Reporting: Upon receiving the advertisement fromfnctone with teseretke K h

BS, all users report their encrypted preferences tofunction with the secret Key KBScnuretyBS concurrently.

Table 1. Notations3. Aggregation: On receiving all t encrypted prefer-

ences from the same group, BS aggregates them3. Secure Preference Aggregation Pro- and computes the sum of preferences for each data

tocol item.

The basic idea of our approach is similar to the jig- Advertising Algorithm 1 describes the details of ad-saw puzzle. Each individual piece of a jigsaw puzzle vertising, which is executed by BS when re-schedulingseems random, whereas the outcome from a correct as- of the broadcast program is demanded2. Let N,, de-sembly of them is expressive. In our approach, a group note the number of users in BS's domain. They areof users share a public data using a threshold secret partitioned into groups of size t. For user Ux, BS as-sharing scheme. Each user's key share is kept secret by signs cx as Ux's membership token, which enables theherself and serves as the encryption key to encrypt his aggregation of preferences from the same group. Fi-preferences. The server collects users' preferences and nally, it broadcasts the results to all subscribers.aggregates user preferences on all data items to broad-cast. Algorithm 1 Advertising (By BS)

Input: threshold group size t, all users' identities;3.1. System Setup and User Registration Output: all users'group membership;

Procedure: (Executed by BS)SRS is responsible for system setup and user reg- 1: Divide all users into groups of size t. The following steps

istration. For an information service, SRS initializes are with respect to every group.D {D1, D2, ... , DN}, the set of data items to broad- 2: Pick a new integer G C E+ as the identity of thiscast. SRS also selects [0,... , s], the range of the user group. Let the group members' identities be denoted bypreference on a data item. It picks a security para- XI, X2, ,Xt.meter t E 3+ as the threshold group size, and se- 3: fori = ;i < t;i++dolects a system parameter Y E Z+ satisfying Y > 4: calculate cxi:t. s. Then, it selects the smallest prime p satisfying Xkp > yN+l lTo use a t-out-of-t threshold secret shar- J7J

Xk moi (1)1<k<t;k#liXkXing scheme [21], it initializes a (t-1)-degree polynomial 5: output < Xi, cxi, G >f(x) = jI_ bizx mod p, where bl, b2, ,bt-1 CR Z 6: end forand bo = 0. A collision-resistant keyed hash function [5] 7: Broadcast (xi, cxi, G), for all 1 < i < N. and T to all'HK () is selected, where K denotes the hash key. SRS users, where T is the expected data broadcast time.keeps b.,. . ., bt-1 and K secret and all others are pub-lic.

Once a user subscribes to an information service at 2 When the re-scheduling process is triggered is out of the scopeSRS, it receives a unique random x CR z as its iden- of this paper. Generally, a base station can re-schedule thetity, a secret key wx f (x), and the hash key K. The broadcast program periodically or based on the real applica-user keeps wx and K secret and lets x be public. As tion requirements.

Reporting The Reporting algorithm shown in Algo- Algorithm 3 Aggregation (by BS)rithm 2 below is executed independently by every user. Input: threshold groups Gi, parameters cx for all usersConsider a user U,, in a threshold group G. Let aj de- Ux C Gi,All z, reported by all users;note Uc 's preference on data item Dj, for 1 < j < N. Output: R1, R2,. ,RN as the collective preferences on

U,, first cumulates a1,.. ., aN into vx, in order to save D1, D2, .., DN respectively;communication cost. Note that a large portion of the Procedure:preferences are zeroes. Therefore, the computation of 1: LetRI....R,RN be all 0.

vxi is not expensive. vxi is then padded and encrypted. 2: for each threshold group Gi, 1 < i < LNI/tj. Let theirThe ciphertext zxi is sent to BS. members be denoted by UXI, UX2,. Ux, do

3: BS computes

Algorithm 2 Reporting (by user Ux) F =Zzxk modp (4)Input: Uxi 's preference set: {ai, a2, ... aN}, membership k=I

token cxi, group identity G, schedule time T, secret key wxi, 4: for j = 1; j < N; j ++ dohash key K. 5:Output: zxi as Uxi 's reply to BSProcedure: r =rmodY; (5)1: Compute: N F = (F - rj)/Y; (6)

vxi = ,7ajYj_l (2) Rj = Rj + rj (7)j=1 6: end for (8)

zxi =vxi + C-IWxi71K(T IG) mod p (3) 7: endfor

2: Send (xi,zxi) toBS. 8: Output R1, R2, RN as the cell's collective prefer-ences.

Aggregation When BS receives the reports from all 4. Security Analysisusers, it starts Aggregation step as depicted in Algo- In this section, we conduct both a theoretical dis-rithm 3. BS first evaluates Equation 4, which returns cussion and an experimental simulation to analyze thethe aggregated preferences for all users from the same information security of the proposed preference aggre-threshold group. Essentially, F N

1 Y where gation approach. Without loss of generality, the adver-Ir iS exactly the sum of all group members' preferencesis exactly the sum of all group members' preferences sary fixes its target: a victim U1 's preference on certainon data item Dj. Thereafter, by evaluating Equation 5 data item3 Dj. Let U1 and U2,... Ut form a thresh-and Equation 6, the sums of preferences for all data old group. We use a1, a2,. at to denote U1,. Ut'sitems are recovered. preferences on Dj respectively. Since a user is only in-We show the correctness of the protocol. Since terested in a small subset of D, the adversary will claim

UX1I UX2,... I UX are in the same threshold group Gi, his success if he correctly determines whether U1 has in-we have Et XW = O mod p by virtue of the (t, t)k=1 CXkWXk , terests in Dj. Namely, the adversary guesses whetherthreshold secret sharing scheme which distributes the

avalue 0.Therefore,' a1 0 or a1 74 0. Note that this type of attack iS muchvalue0. Therefore, stronger than those attempting to determine the ex-

t act value of a1. In the following, our discussion focusesF Zxk mod p on the privacy with respect to a1. The conclusion is ap-

k=1 plicable to any other user preferences.t t Notion of Privacy The notion of privacy is defined asS VXk Mod p + KTJI Gi) 5CXkWXk Mod p the amount of related information revealed by the sys-k=1 k=1 tem to the adversary. Formally, let A be the discrete

= S VXk mod p random variable for Ul's preference on the targetedk=1 data item. A = O iff a1 = O and A = I iff a, 74 0.(rl + r2Y + r3Yy2... + rNYN-1) mod p Let H(A) be the entropy of A. Let H'(A) be the en-

p > yN+land Y > s.t and rj < s.t, Vj E [1, N] tropy of A based on the adversary's observation after

IF rl + r2Y + r3y2... + rN YN-l mounting attacks. Note that H'(A) captures the infor-mation obtained by an adversary regarding to A. Let

where rj is exactly the sum of Ux1, , Ux,'s prefer- p denote the security strength of our scheme with re-ences on data item Dj. The above computation jus-tifies the requirements for selecting p and Y in Sec- 3 Note that the preferences on different data items are com-tion 3.1. pletely independent from each other.

spect to a1. It is defined as serve that f'(O) 0 since f'(0) >Jiwcih-1 Ei=1 (z/ - vl) = Zh-( zi=l Zi vi) = 0.

p= H(A))-H(A) Thus, set wl = f'(xi) for all iC [t + 1, 2t]. Then, the

The semantic of p is the magnitude of information leak- adversary selects a random h CRIp, satisfying z -age regarding to whether a1 = 0. It shows that approx- hwlcxi mod p < yN+l, for all i E [t + 1, 2t]. Computeimately p-bit information is exposed to the adversary vi z - hwlcxi mod p, for all i C [t + 1, 2t]. Thus,by the system. 2t 2t 2t

Adversary In our adversary model, we consider a mali- E v Zi-h E wcxi mod pcious BS, since it is more powerful than eavesdroppers. i=t+l i=t+l i=t+lBS may exploit its participation in protocol execution. Sinw'c - fl(o) o 2t V/It is clear that from Algorithm 3, BS only obtains two ce i Xi f 0 : v+pieces of information relevant to a1: the user's report Z2-t+1 Zi =2. Therefore, the Vf= Vf.Z1, which is directly sent by U1; and the sum of prefer- If7 {Wj, w't, h, h}, which is indistinguishableences r = 7i=l ai which is computed by BS in Equa- from f7 when the hash function ]HK(-) is modelled astion 5. a pseudo-random number generator.We first prove in Lemma 4.1 that the adversary D

does not obtain additional information about users' The Lemma above shows that the protocol exe-preferences, except the sum. For the purpose of clar- cution does not reveal additional information aboutity, suppose that there exist 2t users in BS's domain, {v1, , V}2t except the view Vf, which exposes thewho are divided into two threshold groups G1 and G2. sum of vi-s. Therefore, BS obtains the sum of t user'sLemma 4.1 and its proof also apply to multiple thresh- preferences on all N data items.old groups. We regard the protocol as a batch encryp- Hereafter, we proceed to analyze the informationtion scheme, which takes a set of users' preferences leakage from knowing the sum of preferences for Dj. In{V,'... V2} as input and outputs {Z,'... Z Its se- other words, we analyze the inference of r = Et acret key is in fact the (t-1)-degree polynomial f(x) wt re t

over Z ~~~~~~~~~~~withrespect to a,. For example, when r = O, a, mustover ?LP. be zero since all preferences are non-negative. To facil-

Let {x1,. ,tX} be the users' identities re-spetiely LetT deothe ther itr ipt of itate the discussion, we assume that in average everyspectively. Let .f denuser is interested in n data items, and n << N. There-the protocol execution using f(x). In specific, fore, A follows the distribution below:If {Wl, ,W2t,HK(T IGj),H(K(T IG2)}. Let

n nVf denote the adversary's view of the protocol ex- Pr(A= 0) = 1-N and Pr(A= 1)

NN

ecution. In specific, Vf {zl,.. , Z2t, CI, U2},where =i Zi mod p vi and Consequently,(J2 = zt+i mod p Vt+i. H(A) n- logn - (1 n ) log(1 )

Lemma 4.1 Given a transcript f7 and a view Vf withrespect to a polynomial f(x) and a set of preference in- Let random variable R represent Ei=1 ai, which takesputs Vl, * * * , V2tj, the adversary is able to construct an- on the integer set {0, 1, , st}. Because the adversaryother (t-1)-degreepolynomialf'(x) over 2p and another obtains the information R = r during every protocolset ofiriputs {vj, vl tv, such that (1) Vf/ Vf; (2) Tf execution, H'(A) is computed as the conditional en-

arid, are inidistinguishable. tropy of A given R. Specifically,St

Proof: For the threshold group G1, the adversary H'(A)-H(A R) Pr(R r)H(AR r)randomly chooses the inputs {vj,--- ,v/}, such that

=v E=1Vi. Then it picks hERZ*, and com-

putes w = (zi - v)cl1h-1 modp, for all i C [1,t]. whereClearly, for 1 < i < t, H(AIR= r)

Zi= vi + hw-c z modp - Pr(A=ijR=r)log(Pr(A= iR =r))

Now the adversary constructs f' (x). Use Hence,(X1,Wj), (xt, wl) as t points and calculate ast(t - 1)-degree polynomial f'(x) by Lagrange In- p = H(A) - Pr(R = r)H(AIR = r) (9)terpolation so that fl(xi) = wl. It is clear to ob- p9O

Next, we proceed to evaluate p. The challenge here [ t p (n = 50) p (n = 200)]is how to compute Pr(R r) and Pr(A = 01R = r). 10 0.0625 0.0533To evaluate Pr(A = 0IR r), we introduce another 30 0.0249 0.0197random variable R' representing =2ai, i.e. R R'-+ 50 0.0146 0.0137a,. R' takes on the integer set {0, s(t- 1)}. Hence, 80 0.0086 0.0087

100 0.0071 0.0072Pr(A = R =r) Pr(A =0, R =r)/Pr(R =r) 130 0.0057 0.0067

= Pr(A = 0, R' = r)/Pr(R = r) 200 0.0037 0.0055

*A and R' are independent random variablesA and R' are independent random variablesTable 2. The information leakage with different

Pr(A = 0IR = r) = Pr(A = 0)Pr(R' = r)/Pr(R =r) t and n (N = 1000).

Since Pr(A = IJR = r) = 1-Pr(A = OIR = r), we areable to evaluate Equation 9, provided that Pr(R= r)andPrR ) ar ailable In short, we model the notion of privacy of ourUnfoPrtunately it aipobiel scheme as the amount of exposed information regardingmneothprably, ity distprohibutionlyofmRlanRoby u to a user's interest. We have shown that our scheme re-

thnemutinpromabiali distribution ofRanditg Rpartiont veals insignificant information to the malicious broad-the multinomial distribution and integer partition the-cs evr hnti esnbylre

ories. Instead, we have to acquire the distribution of Rand R' from the observation of 100,000 experiments. Discussion on (t, t) Access StructureThe distributions are shown in Figure 1.

In our scheme, all subscribers in BS's domain are di-vided into groups of size t. For each group, only when

0.03 - < Pr(R) + t reports from its members are available, can BS suc-

0.025 ++ Pr(R') cessfully run the Aggregation algorithm to compute the+ 4 preference sum. Such a requirement is called a t-out-

0.02 of-t access structure, or (t, t) access structure. One may±~ttX i. observe that it does not provide reliability since if one

member fails, the rest t-1 reports cannot be taken into++#001 + account by BS. However, we argue that a flexible ac-

- $+ cess structure will allow BS to mount cross-set attacks0.0 + as explained below.

If BS obtains a preference sum r for a user set G0 and another sum r' for another set G', it computes

0 20 40 60 80 100 120 0r - r', which is the difference of preferences for users

in G- G'. The privacy for users in G- G' is compro-Figure 1. The distribution functions of R and R': t mised when the size of G- G' is small. In particular,100, N = 1000, n = 50, s = 10 if BS is allowed to compute the sum for any t -1 users

out of t users, the difference between two user's prefer-ence can be easily obtained by BS. Since a user's prefer-ence is not uniformly distributed across all data items,

We ran the protocol using different threshold group such a difference reveals their interests. A (t, t) struc-size t. After obtaining the distribution of R and R', we ture ensures that there is no overlap between any le-compute p based on Equation 9. The results are shown gitimate user groups.in Table 2. For instance, it shows that our scheme onlyreveals approximately 0.0071-bit information when us- 5. Performance Analysising t = 100 and n = 50. Moreover, it is evident thatp decreases when t increases. It implies that a larger Computation Cost In order to minimize the compu-t will reduce the information leakage. The intuitive tation load on wireless devices, our protocol avoids ex-reason behind this is that when a threshold group is pensive public key cryptographic techniques. Our ap-formed by more users, there are more randomness in- proach only incurs modular multiplications and addi-troduced and more possible partitions of the preference tions. y, y2, . .. yN-1 are computed by the system ad-sum. Thus, inferring a1 from the sum is less effective. ministrator when initializing the system. A user down-

loads them on the basis of her personal needs and stores mented the broadcast server on a PC with 1GHz CPUthem on her device. During protocol executions, the and 1GB memory. It costs 14,us for a 1024-bit modu-computation load for a user is the evaluation of Equa- lar multiplication. Therefore, the computation load ontion 2 and Equation 3. Since a user is only interested the broadcast server side is only in milliseconds.in n data items, it needs n multiplication and addi- Communtication Cost The protocol in Section 3 onlytion operations in evaluating Equation 2. We observe requires one round communication. The data sent bythat since all ai-s are small numbers bounded by s, a user in Algorithm 2 has the same size of the mod-the computation is cheap despite the fact that some ulus p. Therefore, the total number of bits to send isYr-s are relatively large numbers. To evaluate Equa- log2 P which is O(N(log t + log s)) bits. Note that fortion 3, the user executes two modular multiplications a preference collection scheme without privacy preser-and one hash function. Therefore, the user totally eval- vation, the communication cost is 0 (n log N + n log s),uates 2 modular multiplications of N(log2 t + log2 s) which is much smaller than our cost. Therefore, thebits, 1 hash function, and n regular multiplications. To communication cost is mainly the price paid for pri-avoid huge modulus due to large N, one approach is . .' ~~~~vacy protection. We argue that transmission of a fewto divide the whole data set into subsects, and run the thousand bits is still affordable for modern mobile de-same protocol over each subsect. vices.

To show the performance on mobile devices, ratherthan PCs in our experiments, we use the results from Selectiomnoft A larger t entails higher computation andE. Savas et.al. [20]. They measured the computation communication cost. Both costs grow linearly with thecost of modular multiplications by software and hard- bit length of t. However, considering N is not a smallware on ARM processors, a popular processor in PDAs integer, attentions should be paid to select an appropri-with results shown in Table 3. This is a conservative es- ate t. Table 2 shows the relation between t and securitytimation of the performance as modern mobile devices strength. It is visualized in Figure 2 below. We observeusually have a faster CPU than ARM 80MHz. In our that when t is around 100, increasing t does not pro-experimental settings, N 1000, s 10, the modu- vide significant enhancement on security. Therefore, welus p could be as long as a few kilo-bits. To avoid us- recommend to choose a 6- or 7-bit integer for t.ing a huge modulus, we split the whole data set intosmaller sets of size 100, so that the modulus is around1024bits. According to Table 3, it may only take a num- n=50 --- x

ber of milliseconds for an old-fashion mobile device to 0.06 n=200execute the protocol in our experimental setting. 0

0.05t

0.04Precision Hardware(,us) Software (,ts) (on(bit) (80MHz) ARM with Assem- 0.03

bly)0.02 _ .

224 5.9 33.2256 6.6 42.3 0.011024 61 570 o I

10 30 50 80 100 130 200

Table 3. Execution time of hardware and softwareimplementations ofthe GF(p) multiplication[201 Figure 2. The relation between t and p = H(Ao) -

H'(Ao) (N= 1000)

In Algorithm 3, the broadcast server evaluates tmodular multiplications in Equation 4. Since there areN,,/t groups of users, the broadcast server totally needs 6. Conclusion and Future WorkN,, modular multiplications to calculate all aggregatedpreferences on all data items. Considering the fact that In this paper, we propose a new secure user pro-modular operations and division operations have neg- file collection protocol. Without compromising user pri-ligible cost for a regular PC, we dismiss this portion of vacy, the proposed scheme meets the demands by allcost for Equation 5 and 6. In our experiment, we imple- broadcast scheduling algorithm. A broadcast server is

able to schedule a data broadcast based on collec- [7] J. Benaloh and D. Tuinstra. Receipt-free secret-ballottive preference profile, while each individual user's per- election (extended abstract). In Proceedings of 26th An-sonal interests remain secret. From the protocol execu- nual Symposium on Theory of Computing (STOC'94),tion, the adversaries, either eavesdroppers or malicious pages 544-553, 1994.broadcast servers, obtain insignificant amount of infor- [8] C. Castelluccia, E. Mykletun, and G. Tsudik. Effi-mation. Our scheme does not involve expensive large cient aggregation of encrypted data in wireless sensor

number operations,such as modular exponentiations. networks. In Proccedings of the 3rd Annual Interna-number operations, such as modular exponentiations.With low computation and communicationcost,ourtional Conference on Mobile and Ubiquitous Systems:

Withemeislow acomputation andht-weightcommunican .cost Networks and Services (MobiQuitous'05), July 2005.schrecomepieasibalefor light-weightimobile devces with [9] D. Chaum. Untraceable electronic mail,return address,

and digital pseudonyms. Communicatzons of the ACM,Nonetheless, our construction has two drawbacks. 24(2):84, 1981.

First, our scheme is built on top of (t, t) threshold se- [10] J. D. Ferrer. A provably secure additive and multi-cret sharing, which is not fault tolerant. A malfunc- plicative privacy homomorphism. In Proceedings of thetion of one user will fail the preference aggregation for 5th International Conference on Information Securitythe related threshold group. However, it is demanded (ICIS'02), pages 471-483, 2002.to have such rigid group access structure as pointed out [11] P. Fouque, G. Poupard, and J. Stern. Sharing decryp-in Section 3. It is an open problem how to harmonize tion in the context of voting or lotteries. In Financialsecurity and reliability. Second, our scheme is vulnera- Crypto'00, pages 90-104, 2000.ble to collusion attacks by a malicious BS and a sub- [12] M. J. Freedman and R. Morris. Tarzan: A peer-to-peerscriber. Knowing the secret key K will expose more in- anonymizing network layer. In Proceedings of the 9thformation about user preferences, even though the ad- ACM Conference on Computer and Communications

' . ~~~~~Security (CCS'02) pages 193-206 November2002.versary is still unable to correctly determine a prefer- 'y (C ), pence with an overwhelming probability. [13] S. Hameed and N. H. Vaidya. Efficient algorithms for

scheduling data broadcast. ACM/Baltzer Journal ofWireless Networks (WINET), 5(3):183-193, 1999.

Acknowledgement [14] Q. L. Hu, D. L. Lee, and W.-C. Lee. Optimal chan-nel allocation for data dissemination in mobile com-

We would like to thank Wei Wei for her help on im- puting environments. In Proceedings of the 18th Inter-national Conference on Distributed Computing Systemsplementation work. We are also grateful to anonymous natC'onal pages 4 8 May 1998.

reviewers for their constructive comments. .IDS9) pae48'.7 Ma 98

[15] T. Imielinski, S. Viswanathan, and B. R. Badrinath.Data on air - organization and access. IEEE Trans-

References actions on Knowledge and Data Engineering (TKDE),9(3):353-372, May-June 1997.

[1] M. Abe. Universally verifiable mix-net with verifica- [16] M. Jakobsson. A practical mix. In Advances in Cryptol-tion work independent of the number of mix-servers. In ogy - Eurocrypt'98, pages 448-461.Advances in Cryptology -Eurocrypt'98, pages 437-447, [17] P. Paillier. Public-key cryptosystems based on compos-1998. ite degree residue classes. In Advances in Cryptology -

[2] S. Acharya, R. Alonso, M. Franklin, and S. Zdonik. EuroCrypto'99, pages 223-238, 1999.Broadcast disks: Data management for asymmetric [18] D. Pointcheval. Self-scrambling anonymizers. In Pro-communications environments. In Proceedings of the ceedings of the 4th International Conference on Finan-ACM SIGMOD International Conference on Manage- cial Cryptography, pages 259 - 275, 2000.ment of Data (SIGMOD'95), pages 199-210, May 1995. [19] K. Sako and J. Kilian. Receipt-free mix-type voting

[3] 5. Acharya, M. Franklin, and S. Zdonik. Disseminating scheme - a practical solution to the implementation of a

updates on broadcast disks. In Proceedings of the 22nd voting booth. In Advances in Cryptology - Eurocrypt'95,updates~~~~~~~~~~~~ obracssk.I rcengofte2dpges 393-403 1995.VLDB Conference, pages 354-365, September 1996. pa

4] '. Baudron,P.Fouqe,D.PintchevlG.Pupaird[20] E. Savas, A.F. Tenca, and C.K. Koc. A scalable and uni-

[4] 0. Baudron, P. Fouque, D. Pointcheval, G. Poupard, fied multiplier architecture for finite fields GF(p) andand J. Stern. Practical multi-candidate election system. GF(2m). In Proceedings of the Second InternationalIn Proceedings of the 20th Annual ACM Symposium on Workshop on Cryptographic Hardware and EmbeddedPrinciples of Distributed Computing (PODC'01), 2001. Systems, pages 277- 292, 2000.

[5] Mihir Bellare, Ran Canetti, and Hugo Krawczyk. [21] Adi Shamir. How to share a secret. Communications ofHMAC: Keyed-hashing for message authentication. the ACM, 22(11):612-613, Nov 1979.Technical Report 2104, February 1997. [22] D. Wagner. Resilient aggregation in sensor networks. In

[6] J. Benaloh. Verifiable Secret-Ballot Elections. PhD the- Proceedings of the 2nd ACM workshop on Security of adsis, Yale University, 1987. hoc and sensor networks, pages 78 - 87, 2004.

[23] Z. Yang, S. Zhong, and R. Wright. Anonymity-preserving data collection. In Proceeding of the eleventhACM SIGKDD international conference on Knowledgediscovery in data mining (KDD'05), pages 334 - 343,2005.