privacy preserving ranked multi-keyword

14
7/23/2019 Privacy Preserving Ranked Multi-Keyword http://slidepdf.com/reader/full/privacy-preserving-ranked-multi-keyword 1/14 0018-9340 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2015.2448099, IEEE Transactions on Computers JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY 2015 1 Privacy Preserving Ranked Multi-Keyword Search for Multiple Data Owners in Cloud Computing Wei Zhang,  Student Member, IEEE,  Yaping Lin,  Member, IEEE,  Sheng Xiao,  Member, IEEE, Jie Wu,  Fellow, IEEE,  and Siwang Zhou Abstract  —With the advent of cloud computing, it has become increasingly popular for data owners to outsource their data to public cloud servers while allowing data users to retrieve this data. For privacy concerns, secure searches over encrypted cloud data has motivated several research works under the single owner model. However, most cloud servers in practice do not just serve one owner; instead, they support multiple owners to share the benefits brought by cloud computing. In this paper, we propose schemes to deal with Privacy preserving Ranked Multi-keyword Search in a Multi-owner model (PRMSM). To enable cloud servers to perform secure search without knowing the actual data of both keywords and trapdoors, we systematically construct a novel secure search protocol. To rank the search results and preserve the privacy of relevance scores between keywords and files, we propose a novel Additive Order and Privacy Preserving Function family. To prevent the attackers from eavesdropping secret keys and pretending to be legal data users submitting searches, we propose a novel dynamic secret key generation protocol and a new data user authentication protocol. Furthermore, PRMSM supports efficient data user revocation. Extensive experiments on real-world datasets confirm the efficacy and efficiency of PRMSM. Index Terms  —Cloud computing, ranked keyword search, multiple owners, privacy preserving, dynamic secret key 1 I NTRODUCTION Cloud computing is a subversive technology that is changing the way IT hardware and software are designed and purchased [1]. As a new model of computing, cloud computing provides abundant ben- efits including easy access, decreased costs, quick deployment and flexible resource management, etc. Enterprises of all sizes can leverage the cloud to increase innovation and collaboration. Despite the abundant benefits of cloud comput- ing, for privacy concerns, individuals and enterprise users are reluctant to outsource their sensitive data, including emails, personal health records and govern- ment confidential files, to the cloud. This is because once sensitive data are outsourced to a remote cloud, the corresponding data owners lose direct control of these data [2]. Cloud service providers (CSPs) would promise to ensure owners’ data security using mecha- nisms like virtualization and firewalls. However, these mechanisms do not protect owners’ data privacy from  W. Zhang, Y.p. Lin, S. Xiao, and S.w. Zhou are with College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China, and with Hunan Provincial Key Laboratory of Dependable Systems and Networks, Changsha, 410082, China. Yaping Lin is the Corresponding Author. E-mail:  {  zhangweidoc, yplin, xiaosheng, swzhou}@hnu.edu.cn.  J. Wu is with Department of Computer and Information Sciences, Temple University, 1805 N. Broad St., Philadelphia, PA 19122. E-mail:  [email protected]. the CSP itself, since the CSP possesses full control of cloud hardware, software, and owners’ data. Encryp- tion on sensitive data before outsourcing can preserve data privacy against CSP. However, data encryption makes the traditional data utilization service based on plaintext keyword search a very challenging problem. A trivial solution to this problem is to download all the encrypted data and decrypt them locally. Howev- er, this method is obviously impractical because it will cause a huge amount of communication overhead. Therefore, developing a secure search service over encrypted cloud data is of paramount importance. Secure search over encrypted data has recently at- tracted the interest of many researchers. Song et al. [3] first define and solve the problem of secure search over encrypted data. They propose the conception of searchable encryption, which is a cryptographic prim- itive that enables users to perform a keyword-based search on an encrypted dataset, just as on a plaintext dataset. Searchable encryption is further developed  by [4], [5], [6], [7], [8]. However, these schemes are concerned mostly with single or boolean keyword search. Extending these techniques for ranked multi- keyword search will incur heavy computation and storage costs. Secure search over encrypted cloud data is first defined by Wang et al. [9] and further devel- oped by [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22]. These researches not only reduce the computation and storage cost for secure keyword search over encrypted cloud data, but also enrich the

Upload: sv

Post on 18-Feb-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Privacy Preserving Ranked Multi-Keyword

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 1140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 1

Privacy Preserving Ranked Multi-KeywordSearch for Multiple Data Owners in Cloud

ComputingWei Zhang Student Member IEEE Yaping Lin Member IEEE Sheng Xiao Member IEEE

Jie Wu Fellow IEEE and Siwang Zhou

Abstract mdashWith the advent of cloud computing it has become increasingly popular for data owners to outsource their data to

public cloud servers while allowing data users to retrieve this data For privacy concerns secure searches over encrypted cloud

data has motivated several research works under the single owner model However most cloud servers in practice do not just

serve one owner instead they support multiple owners to share the benefits brought by cloud computing In this paper we

propose schemes to deal with Privacy preserving Ranked Multi-keyword Search in a Multi-owner model (PRMSM) To enable

cloud servers to perform secure search without knowing the actual data of both keywords and trapdoors we systematically

construct a novel secure search protocol To rank the search results and preserve the privacy of relevance scores between

keywords and files we propose a novel Additive Order and Privacy Preserving Function family To prevent the attackers from

eavesdropping secret keys and pretending to be legal data users submitting searches we propose a novel dynamic secret keygeneration protocol and a new data user authentication protocol Furthermore PRMSM supports efficient data user revocation

Extensive experiments on real-world datasets confirm the efficacy and efficiency of PRMSM

Index Terms mdashCloud computing ranked keyword search multiple owners privacy preserving dynamic secret key

1 INTRODUCTION

Cloud computing is a subversive technology that

is changing the way IT hardware and software aredesigned and purchased [1] As a new model of computing cloud computing provides abundant ben-efits including easy access decreased costs quickdeployment and flexible resource management etcEnterprises of all sizes can leverage the cloud toincrease innovation and collaboration

Despite the abundant benefits of cloud comput-ing for privacy concerns individuals and enterpriseusers are reluctant to outsource their sensitive dataincluding emails personal health records and govern-ment confidential files to the cloud This is becauseonce sensitive data are outsourced to a remote cloudthe corresponding data owners lose direct control of these data [2] Cloud service providers (CSPs) wouldpromise to ensure ownersrsquo data security using mecha-nisms like virtualization and firewalls However thesemechanisms do not protect ownersrsquo data privacy from

bull W Zhang Yp Lin S Xiao and Sw Zhou are with College of Computer Science and Electronic Engineering Hunan UniversityChangsha 410082 China and with Hunan Provincial Key Laboratoryof Dependable Systems and Networks Changsha 410082 ChinaYaping Lin is the Corresponding Author E-mail zhangweidoc yplin

xiaosheng swzhouhnueducnbull J Wu is with Department of Computer and Information SciencesTemple University 1805 N Broad St Philadelphia PA 19122 E-mail

jiewutempleedu

the CSP itself since the CSP possesses full control of cloud hardware software and ownersrsquo data Encryp-

tion on sensitive data before outsourcing can preservedata privacy against CSP However data encryptionmakes the traditional data utilization service based onplaintext keyword search a very challenging problemA trivial solution to this problem is to download allthe encrypted data and decrypt them locally Howev-er this method is obviously impractical because it willcause a huge amount of communication overheadTherefore developing a secure search service overencrypted cloud data is of paramount importance

Secure search over encrypted data has recently at-tracted the interest of many researchers Song et al

[3] first define and solve the problem of secure searchover encrypted data They propose the conception of searchable encryption which is a cryptographic prim-itive that enables users to perform a keyword-basedsearch on an encrypted dataset just as on a plaintextdataset Searchable encryption is further developed by [4] [5] [6] [7] [8] However these schemes areconcerned mostly with single or boolean keywordsearch Extending these techniques for ranked multi-keyword search will incur heavy computation andstorage costs Secure search over encrypted cloud datais first defined by Wang et al [9] and further devel-

oped by [10] [11] [12] [13] [14] [15] [16] [17] [18][19] [20] [21] [22] These researches not only reducethe computation and storage cost for secure keywordsearch over encrypted cloud data but also enrich the

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 2140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 2

category of search function including secure rankedmulti-keyword search fuzzy keyword search andsimilarity search However all these schemes arelimited to the single-owner model As a matter of fact most cloud servers in practice do not just serveone data owner instead they often support multipledata owners to share the benefits brought by cloudcomputing For example to assist the governmentin making a satisfactory policies on health care ser-vice or to help medical institutions conduct usefulresearch some volunteer patients would agree toshare their health data on the cloud To preserve theirprivacy they will encrypt their own health data withtheir secret keys In this scenario only the authorizedorganizations can perform a secure search over thisencrypted data contributed by multiple data ownersSuch a Personal Health Record sharing system wheremultiple data owners are involved can be found atmymedwallcom

Compared with the single-owner scheme devel-oping a full-fledged multi-owner scheme will havemany new challenging problems First in the single-owner scheme the data owner has to stay onlineto generate trapdoors (encrypted keywords) for datausers However when a huge amount of data ownersare involved asking them to stay online simultane-ously to generate trapdoors would seriously affect theflexibility and usability of the search system Secondsince none of us would be willing to share our secretkeys with others different data owners would prefer

to use their own secret keys to encrypt their secretdata Consequently it is very challenging to performa secure convenient and efficient search over thedata encrypted with different secret keys Third whenmultiple data owners are involved we should ensureefficient user enrollment and revocation mechanismsso that our system enjoys excellent security and scal-ability

In this paper we propose PRMSM a privacy p-reserving ranked multi-keyword search protocol ina multi-owner cloud model To enable cloud serversto perform secure search without knowing the actual

value of both keywords and trapdoors we system-atically construct a novel secure search protocol Asa result different data owners use different keys toencrypt their files and keywords Authenticated datausers can issue a query without knowing secret keysof these different data owners To rank the searchresults and preserve the privacy of relevance scores between keywords and files we propose a new ad-ditive order and privacy preserving function familywhich helps the cloud server return the most relevantsearch results to data users without revealing anysensitive information To prevent the attackers from

eavesdropping secret keys and pretending to be legaldata users submitting searches we propose a noveldynamic secret key generation protocol and a new da-ta user authentication protocol As a result attackers

who steal the secret key and perform illegal searcheswould be easily detected Furthermore when we wantto revoke a data user PRMSM ensures efficient datauser revocation Extensive experiments on real-worlddatasets confirm the efficacy and efficiency of ourproposed schemes

The main contributions of this paper are listed asfollows

bull We define a multi-owner model for privacy p-reserving keyword search over encrypted clouddata

bull We propose an efficient data user authenticationprotocol which not only prevents attackers fromeavesdropping secret keys and pretending to beillegal data users performing searches but alsoenables data user authentication and revocation

bull We systematically construct a novel secure searchprotocol which not only enables the cloud server

to perform secure ranked keyword search with-out knowing the actual data of both keywordsand trapdoors but also allows data owners toencrypt keywords with self-chosen keys and al-lows authenticated data users to query withoutknowing these keys

bull We propose an Additive Order and Privacy Pre-serving Function family (AOPPF) which allowsdata owners to protect the privacy of relevancescores using different functions according to theirpreference while still permitting the cloud serverto rank the data files accurately

bull We conduct extensive experiments on real-worlddatasets to confirm the efficacy and efficiency of our proposed schemes

The rest of this paper is organized as followsSection 2 formulates the problem Section 3 presentsthe preliminaries Section 4 demonstrates how to per-form user authentication Section 5 introduces ournovel secure search protocol Section 6 defines AOPPFand illustrates how to use this technique to performprivacy-preserving ranked search Section 7 presentssecurity analysis Section 8 demonstrates the efficiencyof our proposed scheme The related works are re-

viewed in Section 9 In Section 10 we conclude thepaper

2 PROBLEM FORMULATION

In this section we present a formal description for thetarget problem in this paper We first define a systemmodel and a corresponding threat model Then weelucidate the design goals of our solution scheme anda list of notations used in later discussions

21 System ModelIn our multi-owner and multi-user cloud comput-ing model four entities are involved as illustratedin Fig 1 they are data owners the cloud server

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 3140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 3

Data Owners Data Users

Data decryption keys

E n c r y

p t e d

i n d e x e

s

Ranked

results

Semi-trusted

Cloud Server

Administration

Server

D a t a U s e r

A u t h e n t i c a t i o n

D a t a O

w n e r

A u t h e

n t i c a t i o n

Encrypted

Files

Re-encrypted

indexes

S e a r c h r e q u e s t

Trapdoors

Secret

Data

Update

Fig 1 Architecture of privacy preserving keywordsearch in a multi-owner and multi-user cloud model

administration server and data users Data ownershave a collection of files F To enable efficient searchoperations on these files which will be encrypteddata owners first build a secure searchable index I on the keyword set W extracted from F then theysubmit I to the administration server Finally dataowners encrypt their files F and outsource the corre-sponding encrypted files C to the cloud server Uponreceiving I the administration server re-encrypts I for the authenticated data owners and outsourcesthe re-encrypted index to the cloud server Once adata user wants to search t keywords over these

encrypted files stored on the cloud server he firstcomputes the corresponding trapdoors and submitsthem to the administration server Once the datauser is authenticated by the administration serverthe administration server will further re-encrypt thetrapdoors and submit them to the cloud server Uponreceiving the trapdoor T the cloud server searches theencrypted index I of each data owner and returns thecorresponding set of encrypted files To improve thefile retrieval accuracy and save communication costa data user would tell the cloud server a parameterk and cloud server would return the top-k relevantfiles to the data user Once the data user receives thetop-k encrypted files from the cloud server he willdecrypt these returned files Note that how to achievedecryption capabilities are out of the scope of thispaper some excellent work regarding this problemcan be found in [23]

22 Threat Model

In our threat model we assume the administrationserver is trusted The administrative server can beany trusted third party eg the Certificate Authority

in the Public Key Infrastructure the aggregation anddistribution layer in [24] and the third party auditorin [2] Data owners and data users who passed theauthentication of the administration server are also

trusted However the cloud server is not trustedInstead we treat the cloud server as rsquocurious buthonestrsquo which is the same as in previous works [9][10] [11] [13] [19] The cloud server follows our pro-posed protocol but it is eager to obtain the contentsof encrypted files keywords and relevance scoresNote that preserving the access pattern ie the listof returned files is extremely expensive since thealgorithm has to rsquotouchrsquo the whole file set [25] Wedo not aim to protect it in this work for efficiencyconcerns

23 Design Goals and Security Definitions

To enable privacy preserving ranked multi-keywordsearch in the multi-owner and multi-user cloud en-vironment our system design should simultaneouslysatisfy security and performance goals

bull Ranked Multi-keyword Search over Multi-owner The proposed scheme should allowmulti-keyword search over encrypted files whichwould be encrypted with different keys for differ-ent data owners It also needs to allow the cloudserver to rank the search results among differentdata owners and return the top-k results

bull Data owner scalability The proposed schemeshould allow new data owners to enter this sys-tem without affecting other data owners or datausers ie the scheme should support data ownerscalability in a plug-and-play model

bull Data user revocation The proposed schemeshould ensure that only authenticated data userscan perform correct searches Moreover once adata user is revoked he can no longer performcorrect searches over the encrypted cloud data

bull Security Goals The proposed scheme shouldachieve the following security goals 1) KeywordSemantic Security (Definition 1) We will provethat PRMSM achieves semantic security againstthe chosen keyword attack 2) Keyword secrecy(Definition 2) Since the adversary A can knowwhether an encrypted keyword matches a trap-

door we use the weaker security goal (ie secre-cy) that is we should ensure that the probabilityfor the adversary A to infer the actual valueof a keyword is negligibly more than randomlyguessing 3) Relevance score secrecy We shouldensure that the cloud server cannot infer theactual value of the encoded relevance scores

Definition 1 Given a probabilistic polynomial timeadversary A he asks the challenger B for the cipher-text of his submitted keywords for polynomial timesThen A sends two keywords w0 and w1 which are notchallenged before to B B randomly sets micro isin 0 1

and returns an encrypted keyword wmicro to A A contin-ues to ask B for the cipher-text of keyword w the onlyrestriction is that w is not w0 or w1 Finally A outputsits guess microprime for micro We define the advantage that A

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 4140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 4

breaks PRMSM as AdvA=Pr[micro = microprime] minus 1

2

If AdvA isnegligible we say that PRMSM is semantically secureagainst the chosen-keyword attack

Definition 2 Given a probabilistic polynomial timeadversary A he asks the challenger B for the cipher-text of his queried keywords for t times Then B randomly chooses a keyword wlowast encrypts it to wlowastand sends wlowast to A A outputs its guess wprime for wlowastand wins if wprime = wlowast We define the probabilitythat A breaks keyword secrecy as AdvA=Pr[wprime = wlowast]We say that PRMSM achieves keyword secrecy if AdvA= 1

uminust + ε where ϵ is a negligible parameter tdenotes the number of keywords that A has knownand u denotes the size of keyword dictionary

24 Notations

bull O the data owner collection denoted as a set of m data owners O = (O1 O2 Om)

bull F i the plaintext file collection of Oi denoted asa set of n data file F i=(F i1 F i2 F in)

bull C i the ciphertext file collection of F i denoted asC i=(C i1 C i2 C in)

bull W the keyword collection denoted as a set of ukeywords W = (w1 w2 wu)

bull W i Oirsquos encrypted keyword collection of W

denoted as W i = ( wi1 wi2 wiu)

bull W the subset of W which represents queried

keywords denoted as W = (w1 w2 wq)

bull T W the trapdoor for

W denoted as T W =

(T w1 T w2 T wq )bull S ijt the relevance score of tth keyword to jth

file of ith data owner

3 PRELIMINARIES

Before we introduce our detailed construction we first briefly introduce some techniques that will be used inthis paper

31 Bilinear Map

Let G and G1 denote two cyclic groups with a prime

order p We further denote g and g1 as the generatorof G and G1 respectively Let e be a bilinear mape GtimesG rarr G1 then the following three conditions aresatisfied 1) Bilinear foralla b isin Z

lowast p e(ga gb) = e(g g)ab 2)

Non-degenerate e(g g) = 1 3) Computable e can beefficiently computed

32 Bilinear Diffie-Hellman Problem and BilinearDiffie-Hellman Assumption

The Bilinear Diffie-Hellman (BDH) problem in(G G1 e) is described as follows given random g isin

G and ga gb gc for some abc isin Zlowast p computee(g g)abc isin G1

The BDH assumption is presented as follows given(G G1 e) g isin G and ga gb gc for some ab c isin Z

lowast p

an adversary A has advantage ϵ in solving BDHwhen P r[A(ga gb gc) = e(g g)abc] ge ϵ The BDHassumption tells that the advantage ϵ is negligible forany polynomial time A

4 DATA USE R AUTHENTICATION

To prevent attackers from pretending to be legal datausers performing searches and launching statisticalattacks based on the search result data users must be authenticated before the administration server re-encrypts trapdoors for data users Traditional authen-tication methods often follow three steps First datarequester and data authenticator share a secret keysay k0 Second the requester encrypts his personallyidentifiable information d0 using k0 and sends the en-crypted data (d0)k0 to the authenticator Third the au-thenticator decrypts the received data with k0 and au-thenticates the decrypted data However this method

has two main drawbacks First since the secret keyshared between the requester and the authenticatorremains unchanged it is easy to incur replay attackSecond once the secret key is revealed to attackersthe authenticator cannot distinguish between the legalrequester and the attackers the attackers can pretendto be legal requesters without being detected

In this section we first give an overview of thedata user authentication protocol Then we introducehow to achieve secure and efficient data user authen-tication Finally we demonstrate how to detect illegalsearches and how to enable secure and efficient datauser revocation

41 Overview

Now we give an example to illustrate the main idea of the user authentication protocol(the detailed protocolis elaborated in the following subsections) AssumeAlice wants to be authenticated by the administrationserver so she starts a conversation with the serverThe server then authenticates the contents of theconversation If the contents are authenticated bothAlice and the server will generate the initial secret key

according to the conversation contents After the ini-tialization to be authenticated successfully Alice hasto provide the historical data of their conversationsIf the authentication is successful both Alice and theadministration server will change their secret keys ac-cording the contents of the conversation In this waythe secret keys keep changing dynamically withoutknowing the correct historical data an attacker cannotstart a successful conversation with the administrationserver

42 User AuthenticationBefore we introduce the dynamic key generationmethod and the authentication protocol we first intro-duce the format of the authentication data As shown

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 5140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 5

Fig 2 Format of Authentication Data

in Fig 2 the authentication data consists of five

parts The request counter field records the number of search requests that the data user has submitted Thelast request time field asks the data user to providethe historical data of his previous request time Thepersonally identifiable data (eg passport numbertelephone number) field is used to identify a specificdata user while the random number and CRC fieldare further used to check whether the authenticationdata has been tampered with

The key point of a successful authentication is toprovide both the dynamically changing secret keysand the historical data of the corresponding data

user Let kij denotes the secret key shared betweenadministration server and the jth data user U j afteri instances of search requests and dij denotes theauthentication data for the (i+1)th request of U j Ourauthentication protocol runs in the following six steps

A Data user U j prepares his authentication datadij ie U j needs to fill in all the fields of authentica-tion data based on his historical data

B Data user U j encrypts dij with the current secretkey kij and submits the encrypted authenticationdata (dij)kij to the administration server

C After submitting the authentication data thedata user U j generates another secret key ki+1j =kij oplus H (dij) and stores both kij and ki+1j

D Upon receiving U j rsquos encrypted authenticationdata the administration server decrypts it with kij

E The administration server checks the requestcounter last request time personally identifiable dataand CRC respectively If the authentication succeedsthe administration server first generates a new secretkey ki+1j = kij oplus H (dij) then he replies a confirma-tion data di+1j and encrypts it with ki+1j Otherwisethe administration server encrypts di+1j with secret

key kij F Upon receiving a reply from the administrationserver the data user U j will try to decrypt it withki+1j If the decrypted data contains the confirmationdata the authentication is successful Otherwise theauthentication is regarded as being unsuccessful Thedata user deletes the new generated secret key ki+1j

and considers whether to start another authenticationFig 3 shows an example of successful authen-

tication between the administration server and thedata user As we can see after each successful au-thentication process the secret key will be changed

dynamically according to the previous key and somehistorical data Therefore once an attacker steals a se-cret key he can hardly get any benefits On one handif the attacker knows nothing about the historical data

Administration

Server

Data

User U i

0

0i

ik

d

1

1i

ik

d

2

2i

ik

d

33

ii

k d

0ik

1 0 0i i ik k H d

2 1 1i i ik k H d

3 2 2i i ik k H d

4 3 3i i ik k H d

0ik

1 0 0i i ik k H d

2 1 1i i ik k H d

3 2 2i i ik k H d

4 3 3i i ik k H d

Fig 3 Example of data user authentication and dy-namic secret key generation

of the legal data user he cannot even construct a legalauthentication data On the other hand if the legaldata user performs another successful authentication

the previous secret key will be expired

43 Illegal Search Detection

In our scheme the authentication process is protected by the dynamic secret key and the historical infor-mation We assume that an attacker has successfullyeavesdropped the secret key k0j of U j Then he hasto construct the authentication data if the attackerhas not successfully eavesdropped the historical da-ta eg the request counter the last request timehe cannot construct the correct authentication dataTherefore this illegal action will soon be detected bythe administration server Further if the attacker hassuccessfully eavesdropped all data of U j the attackercan correctly construct the authentication data andpretend himself to be U j without being detected bythe administration server However once the legaldata user U j performs his search since the secret keyon the administration server side has changed therewill be contradictory secret keys between the admin-istration server and the legal data user Therefore thedata user and administration server will soon detectthis illegal action

44 Data User Revocation

Different from previous works data user revocationin our scheme does not need to re-encrypt and updatelarge amounts of data stored on the cloud server In-stead the adminstration server only needs to updatethe secret data S a stored on the cloud server As will be detailed in the next section S a = gka1middotka2middotra whereka1 and ka2 are the secret keys of the administrationserver and ra is randomly generated for every updateoperation Consequently the previous trapdoors will be expired Additionally without the help of the

administration server the revoked data user cannotgenerate the correct trapdoor T whprime Therefore a datauser cannot perform correct searches once he is re-voked

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 6140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 6

5 MATCHING DIFFERENT-KE Y ENCRYPTED

KEYWORDS

Numerous data owners are often involved in practicalcloud applications For privacy concerns they would be reluctant to share secret keys with others Insteadthey prefer to use their own secret keys to encrypt

their sensitive data (keywords files) When keywordsof different data owners are encrypted with differentsecret keys the coming question is how to locatedifferent-key encrypted keywords among multiple da-ta owners In this section to enable secure efficientand convenient searches over encrypted cloud dataowned by multiple data owners we systematicallydesign schemes to achieve the following three require-ments First different data owners use different secretkeys to encrypt their keywords Second authenticateddata users can generate their trapdoors without know-ing these secret keys Third upon receiving trapdoors

the cloud server can find the corresponding keyword-s from different data ownersrsquo encrypted keywordswithout knowing the actual value of keywords ortrapdoors

51 Overview

Now we present an example to illustrate the mainidea of our keywords matching protocol(the detailedprotocol is elaborated in the following subsections)Assume Alice wants to use the cloud to store her fileF she first encrypts her file F and gets the ciphertextC To enable other users to perform secure searcheson C Alice extracts a keyword wih and sends theencrypted keyword wih = (E aprime E o) to the adminis-tration server The administration server further re-encrypts E aprime to E a and submits wih = (E a E o) to thecloud server Now Bob wants to search a keyword whprime he first generates the trapdoor T primewhprime and submits it tothe administration server The administration serverre-encrypts T primewhprime to T whprime = (T 1 T 2 T 3) generates asecret data S a and submits T whprime S a to the cloudserver The cloud server will judge whether Bobrsquossearch request matches Alicersquos encrypted keyword by

checking whether e (E a

T 3

) = e (E o

T 1

) middot e (S a

T 2

)holds

52 Construction Initialization

Our construction is based on the aforementioned bi-linear map Let g and g1 denote the generator of twocyclic groups G and G1 with order p Let e be a bilinear map e G times G rarr G1 Given different secretparameters as input a randomized key generationalgorithm will output the private keys used in thesystem ka1 isin Z

+ p ka2 isin Z

+ p kif isin Z

+ p kiw isin Z

+ p larr

(0 1)lowast where ka1 and ka2 are the private keys of the

administration server kiw and kif are the privatekeys used to encrypt keywords and files of data ownerOi respectively Let H (middot) be a public hash function itsoutput locates in Z

+ p

53 Keyword Encryption

For keyword encryption the following conditionsshould be satisfied first different data owners usetheir own secret keys to encrypt keywords Secondfor the same keyword it would be encrypted to dif-ferent cipher-texts each time These properties benefit

our scheme for two reasons First losing the key of one data owner would not lead to the disclosure of other ownersrsquo data Second the cloud server cannotsee any relationship among encrypted keywords Giv-en the hth keyword of data owner Oi ie wih weencrypt wih as follows

wih =983080

gkiwmiddotromiddotH (wih) gkiwmiddotro983081

(1)

where ro is a randomly generated number eachtime which helps enhance the security of wih Foreasy description and understanding we let E primea =

gkiwmiddotromiddotH (wih)

and E o = gkiwmiddotro

The data owner delivers E aprime and E o to the admin-

istration server and the administration server furtherre-encrypts E aprime with his secret keys ka1 and ka2 andgets E a

E a =852008

E aprime middot gka1852009ka2 (2)

Therefore wih = (E a E o) The administrative serv-er further submits wih to the cloud server Note thatsince the administration server only does simple com-putations on the encrypted data he cannot learn anysensitive information from these random encrypted

data without knowing the secret keys of data owners

54 Trapdoor Generation

To make the data users generate trapdoors securelyconveniently and efficiently our proposed schemeshould satisfy two main conditions First the datauser does not need to ask a large amount of dataowners for secret keys to generate trapdoors Sec-ond for the same keyword the trapdoor generatedeach time should be different To meet this conditionthe trapdoor generation is conducted in two steps

First the data user generates trapdoors based on hissearch keyword and a random number Second theadministration server re-encrypts the trapdoors forthe authenticated data user

Assume a data user wants to search keyword whprime so he encrypts it as follows

T primewhprime =983080

gH (whprime )middotru gru983081

(3)

where ru is a randomly generated number eachtime As we can see during the trapdoor generationprocess secret keys of data owners are not required

Additionally with the help of random variable ru forthe same keyword whprime we can generate two differenttrapdoors which prevent attackers from knowing therelationship among trapdoors

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 7140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 7

Upon receiving T primewhprime the administration server firstgenerates a random number ra and then re-encryptsT primewhprime as follows

T whprime =983080

gH (whprime )middotrumiddotka1middotka2middotra grumiddotka1 grumiddotka1middotra983081

(4)

For easy description and understanding we let

T 1 = g

H (whprime )middotrumiddotka1middotka2middotra

T 2 = g

rumiddotka1

T 3 = g

rumiddotka1middotra

hence T whprime = (T 1 T 2 T 3) Finally the administrationserver submits T whprime to the cloud server

55 Keywords Matching among Different DataOwners

The cloud server stores all encrypted files and key-words of different data owners The administrationserver will also store a secret data S a = gka1middotka2middotra

on the cloud server Upon receiving a query requestthe cloud will search over the data of all these dataowners The cloud processes the search request in two

steps First the cloud matches the queried keywordsfrom all keywords stored on it and it gets a candidatefile set Second the cloud ranks files in the candidatefile set and finds the most top-k relevant files Weintroduce the matching strategy here while leavingthe task of introducing the ranking strategy in the nextsection When the cloud obtains the trapdoor T whprime andencrypted keywords (E o E a) he first computes

e (S a T 2)

= e852008

gramiddotka1middotka2 grumiddotka1852009

(5)

= e(g g)ramiddotka1middotka2middotrumiddotka1

Then he can judge whether whprime = wih (ie anencrypted keyword is located) holds if the followingequation is true

e (E a T 3)

= e

1048616983080gkiwmiddotromiddotH (wih) middot gka1

983081ka2 grumiddotka1middotra

1048617= e(g g)

(kiw middotromiddotH (wih)+ka1)middotka2middotrumiddotka1middotra (6)

= e(g g)kiwmiddotromiddotH (wih)middotka2middotrumiddotka1middotra middot e (S a T 2)

= e983080

gkiwmiddotro gH (wih)middotka2middotrumiddotka1middotra983081

middot e (S a T 2)

= e (E o T 1) middot e (S a T 2)

6 PRIVACY PRESERVING RANKED SEARCH

The aforementioned section helps the cloud matchthe queried keywords and obtain a candidate fileset However we cannot simply return undifferentialfiles to data users for the following two reasons Firstreturning all candidate files would cause abundantcommunication overhead for the whole system Sec-ond data users would only concern the top-k relevantfiles corresponding to their queries In this sectionwe first elucidate an order and privacy preserving

encoding scheme Then we illustrate an additive orderpreserving and privacy preserving encoding schemeFinally we apply the proposed scheme to encode therelevance scores and obtain the top-k search results

x 1 2 3 4 5

f ( x) 100-1000 1100-1800 2000-4200 4300-5000 5100-7000

Fig 4 An example of Order Preserving and PrivacyPreserving Function

61 Order and Privacy Preserving Function

To rank the relevance score while preserving its priva-cy the proposed function should satisfy the followingconditions 1) This function should preserve the orderof data as this helps the cloud server determine whichfile is more relevant to a certain keyword according tothe encoded relevance scores 2) This function shouldnot be revealed by the cloud server so that cloudserver can make comparisons on encoded relevancescores without knowing their actual values 3) Dif-ferent data owners should have different functionssuch that revealing the encoded value of a data ownerwould not lead to the leakage of encoded values of other data owners In order to satisfy condition 1we introduce a data processing part m(x middot) whichpreserves the order of x To satisfy condition 2 weintroduce a disturbing part rf which helps prevent thecloud server from revealing this function To satisfycondition 3 we use m(x middot) to process the ID of dataowners So this function belongs to the followingfunction family

F yoppf (x) =991761

0lejkleτ

Ajk middot m(x j) middot m(y k) + rf (7)

where τ denotes the degree of F yoppf (x) and Ajkdenotes the coefficients of m(x j) middot m(y k) We furtherdefine m(x j) as follows m(x 0) = 1 m(x 1) = xand m(x j) = (m(x jminus1)+α middot x) middot (1+λ) if jgt1 whereα and λ are two constant numbers

Now we introduce how to set the disturbing partrf Since

sum0lejkleτ Ajkmiddot(m(x+1 j)minusm(x j))middotm(y k)gesum

0lejkleτ Ajkmiddot((1+λ)jminus1

+ αmiddotsum

1leilejminus2(1+λ))

Let l be an integer such that2lminus1le

sum0lejkleτ Ajkmiddot((1+λ)

jminus1+αmiddot

sum1leilejminus2(1+λ))le2l

we can set rf isin (0 2lminus1)Obviously forallx1gtx2 we have F yoppf (x1)gtF yoppf (x2)

62 Additive Order and Privacy Preserving Func-tion

With the order and privacy preserving function wehave if x1 gt x2 then y1 gt y2 However the additionof the function is not necessarily order preserving

An example is presented in Fig 4 Obviously f (x)

is order preserving to preserve privacy a disturbingpart is introduced As we can see f (1) + f (5) variesfrom 5200 to 8000 f (1) + f (4) varies from 4400 to6000 f (2) + f (3) varies from 3100 to 6000 Though

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 8140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 8

1 + 5 gt 1 + 4 and 1 + 5 gt 2 + 3 it is probable forf (1)+f (5) lt f (1)+f (4) and f (1)+f (5) lt f (2)+f (3)

To correctly perform the secure ranked multi-keyword search the sum of any two encoded rel-evance scores should still be ordered and privacypreserved (we use the sum of encoded relevance scoreto evaluate the relevance between a file and multiplekeywords in this paper) To satisfy this conditionwe further design an additive order and privacypreserving function family based on Eq 7

F yaoppf (x) =991761

0lejkleτ

Ajk middot m(x j) middot m(y k) + raof (8)

Where τ denotes the degree of F yaoppf (x) and Ajkdenotes the coefficients of m(x j) middot m(y k)

Before we take a step further towards the useof F yaoppf (x) we will first introduce our auxiliarytheorem ie how to make an order and privacypreserving function to be additive order and privacypreserving

Definition 3 Given an order preserving functiony = f (x) we define ∆f (xi) = f (xi+1) minus f (xi) and∆f (xi) = ∆f (xi+1) minus ∆f (xi) To preserve privacy wedefine yi = f (xi) + ri where ri is a random numberand ri gt 0 We define the max value of ri as rimaxand rimax = f (xi)minusimiddot∆(xi) hence yi can change fromf (xi) to f (xi) + rimax

Theorem 1 Given an order preserving function yi =f (xi) + ri when the following three conditionsare satisfied (1) foralli ∆f (xi + 1) le ∆f (xi) (2) foralli

˜∆f (xi + 1) ge ˜

∆f (xi) (3) foralli ri lt 983080i

2

middot ˜∆f (xi)983081 2The order and privacy preserving function yi is also

an additive order and privacy preserving that isfor any

sumxiisinD1

xi le sum

xjisinD2xj where D1 and D2

denotes two subsets of the definition domain we havesumxiisinD1

yi lesum

xjisinD2yj Proof The proof is elaborated in Appendix A

63 Encoding relevance scores

In our paper we use a well-known method to com-pute the relevance score [26] ie Score(w F d) =1

|F d|(1+ln f dw)ln(1+ N

f w) where w denotes the given

keyword |F d| is the length of file F d f dw denotes theT F of w in file F d f w denotes the number of filescontaining w and N denotes the total number of filesin the collection

Now we introduce how to encode the relevancescore with an additive order and privacy preservingfunction in our additive order and privacy preservingfunction family F yaoppf (x) Given input S ijt (the rel-evance score of tth keyword to jth file of ith dataowner) Data owners can independently choose afunction from this family to protect the privacy of their relevance scores For simplicity in this paper we

specify data owner i to choose F H i(i)aoppf (middot) and defineV ijt as the encoded data of S ijt

V ijt = F H i(i)aoppf (S ijt) (9)

where H i(middot) is a public known hash function and i isthe I D of data owner Oi

With the well-designed properties of F aoppf thecloud server can make a comparison among encodedrelevance scores for the same data owner Howeversince different data owners encode their relevancescores with different functions in F aoppf the cloudserver cannot make a comparison between encodedrelevance scores for different data owners To solvethis problem we define

T ijt(y) = F yaoppf (S ijt) (10)

where T ijt(y) is used to help the cloud servermake comparisons among relevance scores encoded by different data owners and y is a variable whichtakes the hash value of data ownerrsquos ID as input

Finally we attach each V ijt with a T ijt(y)

64 Ranking search results

In this paper we use the sum of the relevance scoresas the metric to rank search results Now we intro-duce the strategies of ranking search results based onthe encoded relevance scores First the cloud com-putes V ij =

sumtisinW

V ijt the sum of encoded relevance

scores between the jth file and matched keywords forOi and the auxiliary value T ij(y) =

sumtisinW

T ijt(y) Then

the cloud ranks the sum of encoded relevance scorewith the following two conditions

(1) Two encoded data belong to the same dataowner Given that a data user issues a query W =wm wn we assume that Oirsquos F 1 and F 2 satisfy thequery Then the cloud adds the encoded relevancescore together and gets the relevance score of Oiprimes F 1to W

V i1 = V i1m + V i1n (11)

Similarly

V i2 = V i2m + V i2n (12)

If V i1 ge V i2 then F 1 is more relevant to the queried

keyword sets W otherwise F 2 is more relevant to W (2) Two encoded data belong to two different data

owners Given that a data user issues a query W =wm wn we assume Oirsquos F 1 and Oj rsquos F 2 satisfy thequery The cloud server makes comparison on theirencoded relevance scores in the following four steps

First the cloud server computes the relevance score

of Oiprimes F 1 to W

V i1 = V i1m + V i1n (13)

Second the cloud server computes the T j2(y)

T j2(y) = T j2m(y) + T j2n(y) (14)

Third the cloud server substitutes H i(i) for thevariable y and gets T j2(H (i))

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 9140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 9

C 11

C 12

C 21

C 23

O1 O2

H (1)

Fig 5 Example of ranking search results

Finally the cloud draws the conclusion if V i1 ge

T j2(H i(i)) then Oiprimes F 1 is more relevant to W thanOj primes F 2 otherwise Oj primes F 2 is more relevant to W

Now the cloud server can make comparisons onencoded relevance scores Thus it is easy for the cloudto return the top-k relevant files to the data user

Fig 5 shows an example of making comparisons onthe encoded relevance scores As we can see giventhe trapdoor T w1 the cloud server finds that w11 andw21 match T w1 Data owner O1 has two encryptedfiles C 11 and C 12 that contain w11 Data ownerO2 also has two encrypted files C 21 and C 23 thatcontain w21 For O1 he compares V 111 with V 121 if V 111 gt V 121 the cloud server regards C 11 as beingmore relevant to T w1 otherwise C 21 is more relevantto T w1 We assume C 12 is more relevant to T w1 for O1

and C 21 is more relevant for O2 then the cloud makescomparisons between C 12 and C 21 As shown in thefigure the cloud first substitutes H (1) to T 211(y) andgets T 211(H (1)) then he compares T 211(H (1)) withV 121 if V 121 gt T 211(H (1)) Then O1rsquos encrypted fileC 12 is more relevant to T w1 otherwise O2rsquos encryptedfile C 21 is more relevant to the search request

7 SECURITY ANALYSIS

In this section we provide step-by-step security anal-yses to demonstrate that the security requirementshave been satisfied for the data files the keywordsthe queries and the relevance scores

71 Data Files

The data files are protected by symmetric encryption before upload As long as the encryption algorithm isnot breakable the cloud server cannot know the data

72 Keywords

We formulate the security goals achieved by PRMSMwith the following two theorems

Theorem 2 Given the DDH (Decisional DiffieHell-man) assumption PRMSM is semantically secure a-gainst the chosen keyword attack under the selectivesecurity modelProof See Appendix B

Theorem 3 Given the DL (Discrete Logarithm) as-sumption PRMSM achieves keyword secrecy in therandom oracle modelProof See Appendix C

73 Trapdoors

Recall the trapdoor construction formula

T whprime =983080

gH (whprime )middotrumiddotka1middotka2middotra grumiddotka1 grumiddotka1middotra983081

(15)

If the cloud server wants to know the actual valueof the trapdoor and distinguish two trapdoors it has

to solve the discrete logarithm problem in Z p withlarge prime p therefore the privacy of trapdoor isprotected as long as the discrete logarithm problem ishard

74 Relevance Scores

In our scheme relevance scores are encoded with twoAdditive Order and Privacy Preserving FunctionsNow we analyze the security of additive order andprivacy preserving functions Assume the input for

the F yaopp

(x) is

s and the data owner ID is

i Thenthe cloud server can only capture the following value

that is derived from s

F (H i(i))aopp (s) =991761

0lejkleτ

Ajk middotm(s j)middotm(H i(i) k)+raof

(16)Assume the cloud server has collected n encod-

ed relevance scores for the same relevance score s(F

(H i(i))aopp (s)) then he can construct n equations How-

ever these functions have n + 1 unknown variablesTherefore it is infeasible for the cloud server to breakthe additive order and privacy preserving F yaopp(x)

Thus the security of the corresponding F yaopp(x) en-coded relevance score is also preserved

8 PERFORMANCE EVALUATION

In this section we measure the efficiency of PRMSMand compare it with its previous version SecureRanked Multi-keyword Search for Multiple data own-ers in cloud computing (SRMSM) [27] and the state-of-the-art privacy-preserving Multi-keyword RankedSearch over Encrypted cloud data (MRSE) [11] side

by side Since MRSE is only suitable for the singleowner model our PRMSM and SRMSM not onlywork well in multi-owner settings but also outper-form MRSE on many aspects

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 10140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 10

Number of files in the dataset (times103)

T i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(a) For different size datasetswith the same dictionary u =4000

Number of keywords in dictionary(times103) T

i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(b) For the same dataset withdifferent dictionary sizes n =1000

T i m e o f E n c o d i n g y ( s )

Number of keywords in dictionary (times103)

(c) Time cost of encoding rel-evance score (a subroutine of

building index)

Fig 6 Time cost of index construction

Number of keywords in dictionary(times103) T i m e o f g e n e r a t i n g t r a p d o o r ( s )

(a) For different size of key-word dictionary with the samenumber of queried keywordsq = 100

Number of queried keywords (times102) T i m e o f g e

n e r a t i n g t r a p d o o r ( times 1 0 - 2 s )

(b) For different number of queried keywords with thesame size of keyword dictio-nary u = 4000

Fig 7 Time cost of generating trapdoors

81 Evaluation Settings

We conduct performance experiments on a real data

set the Internet Request For Comments dataset (RFC)[28] We use Hermetic Word Frequency Counter [29]to extract keywords from each RFC file After thekeyword extraction we compute keyword statisticssuch as the keyword frequency in each file the lengthof each file the number of files containing a specifickeyword etc We further calculate the relevance scoreof a keyword to a file based on these statistics Thefile size and keyword frequency of this data set can be seen in [20]

The experiment programs are coded using thePython programming language on a PC with 22GHZ

Intel Core CPU and 2GB memory We implementall necessary routines for data owners to preprocessdata files for the data user to generate trapdoorsfor the administrative server to re-encrypt keywordstrapdoors and for the cloud server to perform rankedsearches We use the Weil pairing [30] to construct our bilinear map

82 Evaluation Results

821 Index Construction

Fig 6(a) shows that given the same keyword dictio-

nary (u=4000) time of index construction for theseschemes increases linearly with an increasing numberof files while SRMSM and PRMSM spend much lesstime on index construction Fig 6(b) demonstrates

that given the same number of files (n=1000) SRMSMand PRMSM consume much less time than MRSE

on constructing indexes Additionally SRMSM andPRMSM are insensitive to the size of the keyworddictionary for index construction while MRSE suf-fers a quadratic growth with the size of keyworddictionary increases Fig 6(c) shows the encodingefficiency of our proposed AOPPF The time spent onencoding increases from 01s to 1s when the numberof keywords increases from 1000 to 10000 This timecost can be acceptable

822 Trapdoor Generation

Compared with index construction trapdoor genera-

tion consumes relatively less time Fig 7(a) demon-strates that given the same number of queried key-words (q=100) SRMSM and PRMSM are insensitive tothe size of keyword dictionary on trapdoor generationand consumes 0026s and 0031s respectively Mean-while MRSE increases from 004s to 62s Fig 7(b)shows that given the same number of dictionary size(u=4000) when the number of queried keywords in-creases from 100 to 1000 the trapdoor generation timefor MRSE is 031s and remains unchanged WhileSRMSM increases from 0024s to 025s PRMSM in-creases from 0031s to 031s We observe that PRMSM

spends a little more time than SRMSM on trapdoorgeneration the reason is that PRMSM introduces anadditional variable to ensure the randomness of trap-doors

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 11140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 11

T

i m e c o s t o f r e - e n c r y p t i o n ( s )

Average number of keywords per owner

(a) Time cost of re-encryptingkeywords

Average number of trapdoors per user

T i m e c o s t o f r e - e n c r y p t i o n ( s )

(b) Time cost of re-encryptingtrapdoors

Fig 8 Time cost of the administration server

Number of queried keywords

T i m e o f q u e r y ( s )

(a) For different size of queriedkeywords with the same sizeof data set n=2000 and key-word dictionary u = 100

Number of files in the dataset (times103)

T i m e o f q u e r y ( s )

(b) For different size of data setwith the same size of queriedkeywords q=5 and keyworddictionary u = 100

Number of keywords in dictionary (times102

T i m e o f q u e r y ( s )

(c) For different size of key-word dictionary with the samesize of queried keywords q=5and data set n = 2000

Fig 9 Time cost of search

823 Re-encryption by the administration server

Fig 8(a) illustrates the re-encryption time cost of the

administration server in PRMSM As we can see forthe same average number of keywords per owner themore data owners are involved the more time is spenton re-encryption When there are 300 data ownerseach data owner has 100 keywords we need 38s tore-encrypt these keywords which is acceptable

Fig 8(b) demonstrates the time cost of re-encryptingtrapdoors We observe that for the same averagenumber of trapdoors per user the more data usersthat submit trapdoors the more time would be spenton re-encryption When there are 1000 data userswho concurrently submit data each data user has 10

trapdoors we only need 334s for re-encryption

824 Search

From Fig 9 we observe that PRMSM spends moretime for searching The fundamental reason is thatthe pairing operation used in PRMSM needs moretime As we can see from Fig 9(a) and Fig 9(c)the more keywords existing in the cloud server themore time is required for pairing operation Fig 9(b)confirms that when the number of keywords stored onthe cloud server remains a constant PRMSM will not

increase even if the number of files increases ThoughPRMSM spends relatively more time this observationalso confirms that the searching operation should beoutsourced to the cloud server

9 RELATED WOR K

In this section we review three categories of worksearchable encryption secure keyword search in cloudcomputing and order preserving encryption

91 Searchable Encryption

The earliest attempt of searchable encryption wasmade by Song et al In [3] they propose to encrypteach word in a file independently and allow the serverto find whether a single queried keyword is containedin the file without knowing the exact word Thisproposal is more of theoretic interests because of highcomputational costs Goh et al propose building akeyword index for each file and using Bloom filter

to accelerate the search [4] Curtmola et al propose building indices for each keyword and use hashtables as an alternative approach to searchable en-cryption [5] The first public key scheme for keywordsearch over encrypted data is presented in [6] [7] and[8] further enrich the search functionalities of search-able encryption by proposing schemes for conjunctivekeyword search

The searchable encryption cares mostly about singlekeyword search or boolean keyword search Extend-ing these techniques for ranked multi-keyword searchwill incur heavy computation and storage costs

92 Secure Keyword Search in Cloud Computing

The privacy concerns in cloud computing motivatethe study on secure keyword search Wang et al

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 12140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 12

first defined and solved the secure ranked keywordsearch over encrypted cloud data In [9] and [18] theyproposed a scheme that returns the top-k relevant filesupon a single keyword search Cao et al [10] [11]and Sun et al [31] [12] extended the secure keywordsearch for multi-keyword queries Their approachesvectorize the list of keywords and apply matrix mul-tiplications to hide the actual keyword informationfrom the cloud server while still allowing the serverto find out the top-k relevant data files Xu et alproposed MKQE (Multi-Keyword ranked Query onEncrypted data) that enables a dynamic keyword dic-tionary and avoids the ranking order being distorted by several high frequency keywords [13] Li et al[14] Chuah et al [15] Xu et al [16] and Wang et al[17] proposed fuzzy keyword search over encryptedcloud data aiming at tolerance of both minor typosand format inconsistencies for usersrsquo search input [19]further proposed privacy-assured similarity searchmechanisms over outsourced cloud data In [20] weproposed a secure efficient and distributed keywordsearch protocol in the geo-distributed cloud environ-ment

The system model of these previous works onlyconsider one data owner which implies that in theirsolutions the data owner and data users can easilycommunicate and exchange secret information Whennumerous data owners are involved in the systemsecret information exchanging will cause considerablecommunication overhead Sun et al [21] and Zheng

et al [22] proposed secure attribute-based keywordsearch schemes in the challenging scenario wheremultiple owners are involved However applying CP-ABE in the cloud system would introduce problemsfor data user revocation ie the cloud has to updatethe large amount of data stored on it for a data userrevocation [32] Additionally they do not supportprivacy preserving ranked multi-keyword search Ourpaper differs from previous studies regarding the em-phasis of multiple data owners in the system modelThis paper seeks a solution scheme to maximally relaxthe requirements for data owners and users so that

the scheme could be suitable for a large number of cloud computing users

93 Order Preserving Encryption

The order preserving encryption is used to preventthe cloud server from knowing the exact relevancescores of keywords to a data file The early work of Agrawal et al proposed an Order Preserving sym-metric Encryption (OPE) scheme where the numericalorder of plain texts are preserved [33] Boldyreva etal further introduced a modular order preserving

encryption in [34] Yi et al [35] proposed an order p-reserving function to encode data in sensor networksPopa et al [36] recently proposed an ideal-secureorder-preserving encryption scheme Kerschbaum et

al [37] further proposed a scheme which is not onlyidea-secure but is also an efficient order-preservingencryption scheme However these schemes are notadditive order preserving As a complementary workto the previous order preserving work we propose anew additive order and privacy preserving functions(AOPPF) Data owners can freely choose any functionfrom an AOPPF family to encode their relevancescores The cloud server computes the sum of encodedrelevance scores and ranks them based on the sum

10 CONCLUSIONS

In this paper we explore the problem of securemulti-keyword search for multiple data owners andmultiple data users in the cloud computing envi-ronment Different from prior works our schemesenable authenticated data users to achieve secureconvenient and efficient searches over multiple dataownersrsquo data To efficiently authenticate data usersand detect attackers who steal the secret key andperform illegal searches we propose a novel dynamicsecret key generation protocol and a new data userauthentication protocol To enable the cloud server toperform secure search among multiple ownersrsquo dataencrypted with different secret keys we systematical-ly construct a novel secure search protocol To rank thesearch results and preserve the privacy of relevancescores between keywords and files we propose anovel Additive Order and Privacy Preserving Func-tion family Moreover we show that our approachis computationally efficient even for large data andkeyword sets As our future work on one hand wewill consider the problem of secure fuzzy keywordsearch in a multi-owner paradigm On the other handwe plan to implement our scheme on the commercialclouds

ACKNOWLEDGMENTS

This work is supported in part by the National Natu-ral Science Foundation of China (Project No 6117303861472125 61300217) and the China Scholarship Coun-cil A preliminary version of this paper appearedin the Proceedings of the IEEEIFIP InternationalConference on Dependable Systems and Networks(DSNrsquo14) [27]

REFERENCES

[1] M Armbrust A Fox R Griffith A D Joseph R KatzA Konwinski G Lee D Patterson A Rabkin I Stoica andM Zaharia ldquoA view of cloud computingrdquo Communication of the ACM vol 53 no 4 pp 50ndash58 2010

[2] C Wang S S Chow Q Wang K Ren and W Lou ldquoPrivacy-preserving public auditing for secure cloud storagerdquo Comput-

ers IEEE Transactions on vol 62 no 2 pp 362ndash375 2013[3] DSong DWagner and APerrig ldquoPractical techniques for

searches on encrypted datardquo in Proc IEEE International Sym- posium on Security and Privacy (SampPrsquo00) Nagoya Japan Jan2000 pp 44ndash55

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 13140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 13

[4] E Goh (2003) Secure indexes [Online] Availablehttpeprintiacrorg

[5] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proc ACM CCSrsquo06 VA USA Oct 2006 pp79ndash88

[6] D B et al ldquoPublic key encryption with keyword search secureagainst keyword guessing attacks without random oraclerdquoEUROCRYPT vol 43 pp 506ndash522 2004

[7] P Golle J Staddon and B Waters ldquoSecure conjunctive key-word search over encrypted datardquo in Proc Applied Cryptogra-

phy and Network Security (ACNSrsquo04) Yellow Mountain China Jun 2004 pp 31ndash45

[8] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo in ProcInformation and Communications Security (ICICSrsquo05) BeijingChina Dec 2005 pp 414ndash426

[9] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo in Proc IEEEDistributed Computing Systems (ICDCSrsquo10) Genoa Italy Jun2010 pp 253ndash262

[10] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo in Proc IEEE INFOCOMrsquo11 Shanghai China Apr 2011pp 829ndash837

[11] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo Parallel and Distributed Systems IEEE Transactions onvol 25 no 1 pp 222ndash233 2014

[12] W Sun B Wang N Cao M Li W Lou Y T Hou and H LildquoVerifiable privacy-preserving multi-keyword text search inthe cloud supporting similarity-based rankingrdquo Parallel andDistributed Systems IEEE Transactions on vol 25 no 11 pp3025ndash3035 2014

[13] Z Xu W Kang R Li K Yow and C Xu ldquoEfficient multi-keyword ranked query on encrypted data in the cloudrdquo inProc IEEE Parallel and Distributed Systems (ICPADSrsquo12) Singa-pore Dec 2012 pp 244ndash251

[14] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo in

Proc IEEE INFOCOMrsquo10 San Diego CA Mar 2010 pp 1ndash5[15] M Chuah and W Hu ldquoPrivacy-aware bedtree based solutionfor fuzzy multi-keyword search over encrypted datardquo in ProcIEEE 31th International Conference on Distributed ComputingSystems (ICDCSrsquo11) Minneapolis MN Jun 2011 pp 383ndash392

[16] P Xu H Jin Q Wu and W Wang ldquoPublic-key encryptionwith fuzzy keyword search A provably secure scheme underkeyword guessing attackrdquo Computers IEEE Transactions onvol 62 no 11 pp 2266ndash2277 2013

[17] B Wang S Yu W Lou and Y T Hou ldquoPrivacy-preservingmulti-keyword fuzzy search over encrypted data in thecloudrdquo in IEEE INFOCOM Toronto Canada May 2014 pp2112ndash2120

[18] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoParallel and Distributed Systems IEEE Transactions on vol 23

no 8 pp 1467ndash1479 2012[19] C Wang K Ren S Yu and K M R Urs ldquoAchieving usableand privacy-assured similarity search over outsourced clouddatardquo in Proc IEEE INFOCOMrsquo12 Orlando FL Mar 2012 pp451ndash459

[20] W Zhang Y Lin S Xiao Q Liu and T Zhou ldquoSecuredistributed keyword search in multiple cloudsrdquo in ProcIEEEACM IWQOSrsquo14 Hongkong May 2014 pp 370ndash379

[21] W Sun S Yu W Lou Y T Hou and H Li ldquoProtectingyour right Attribute-based keyword search with fine-grainedowner-enforced search authorization in the cloudrdquo in ProcIEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 226ndash234

[22] Q Zheng S Xu and G Ateniese ldquoVabks Verifiable attribute- based keyword search over outsourced encrypted datardquo inProc IEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 522ndash530

[23] T Jung X Y Li Z Wan and M Wan ldquoPrivacy preservingcloud data access with multi-authoritiesrdquo in Proc IEEE INFO-COMrsquo13 Turin Italy Apr 2013 pp 2625ndash2633

[24] Q Liu C C Tan J Wu and G Wang ldquoEfficient informationretrieval for ranked queries in cost-effective cloud environ-

mentsrdquo in INFOCOM 2012 Proceedings IEEE IEEE 2012 pp2581ndash2585

[25] B Chor E Kushilevitz O Goldreich and M Sudan ldquoPrivateinformation retrievalrdquo Journal of the ACM vol 45 no 6 pp965ndash981 1998

[26] I H Witten A Moffat and T C Bell Managing gigabytesCompressing and indexing documents and images San FranciscoUSA Morgan Kaufmann 1999

[27] W Zhang S Xiao Y Lin T Zhou and S Zhou ldquoSecure ranked

multi-keyword search for multiple data owners in cloud com-putingrdquo in Proc 44th Annual IEEEIFIP International Conferenceon Dependable Systems and Networks (DSN2014) Atlanta USAIEEE jun 2014 pp 276ndash286

[28] IETF ldquoRequest for comments databaserdquo [Online] Availablehttpwwwietforgrfchtml

[29] H Systems ldquoHermetic word frequency counterrdquo [Online]Available httpwwwhermeticchwfcwfchtm

[30] D Boneh and M Franklin ldquoIdentity-based encryption fromthe weil pairingrdquo SIAM Journal on Computing vol 32 no 3pp 586ndash615 2003

[31] W Sun B Wang N Cao M Li W Lou Y T Hou andH Li ldquoPrivacy-preserving multi-keyword text search in thecloud supporting similarity-based rankingrdquo in Proc IEEE

ASIACCSrsquo13 Hangzhou China May 2013 pp 71ndash81[32] J Hur ldquoImproving security and efficiency in attribute-based

data sharingrdquo IEEE TRANSACTIONS ON KNOWLEDGE ANDDATA ENGINEERING vol 25 no 10 pp 2271ndash2282 2013

[33] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preserv-ing encryption for numeric datardquo in Proc ACM SIGMODrsquo04Paris France Jun 2004 pp 563ndash574

[34] A Boldyreva N Chenette Y Lee and A O ldquoOrder-preserving encryption revisited Improved security analysisand alternative solutionsrdquo in Proc Advances in Cryptology(CRYPTOrsquo11) California USA Aug 2011 pp 578ndash595

[35] Y Yi R Li F Chen A X Liu and Y Lin ldquoA digitalwatermarking approach to secure and precise range queryprocessing in sensor networksrdquo in Proc IEEE INFOCOMrsquo13Turin Italy Apr 2013 pp 1950ndash1958

[36] R A Popa F H Li and N Zeldovich ldquoAn ideal-security pro-tocol for order-preserving encodingrdquo in Security and Privacy

(SP) 2013 IEEE Symposium on IEEE 2013 pp 463ndash477[37] F Kerschbaum and A Schroepfer ldquoOptimal average-complexity ideal-security order-preserving encryptionrdquo inProceedings of the 2014 ACM SIGSAC Conference on Computerand Communications Security ACM 2014 pp 275ndash286

Wei Zhang received his BS degree in Com-puter Science from Hunan University Chinain 2011 Since 2011 He has been a PhDcandidate in College of Computer Scienceand Electronic Engineering Hunan Universi-ty His research interests include cloud com-puting network security and data mining

Yaping Lin received his BS degree in Com-puter Application from Hunan University Chi-na in 1982 and his MS degree in Com-puter Application from National University ofDefense Technology China in 1985 He re-ceived his PhD degree in Control Theoryand Application from Hunan University in2000 He has been a professor and PhD

supervisor in Hunan University since 1996From 2004-2005 he worked as a visitingresearcher at the University of Texas at Ar-

lington His research interests include machine learning networksecurity and wireless sensor networks

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 1414

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 14

Sheng Xiao received his BS degree in Ts-inghua University China in 2002 and hisMS degree from the National SingaporeUniversityMIT (Singapore-MIT Alliance) in2003 He received his PhD degree fromthe University of Massachusetts Amherst in2013 He has been an associate professor atthe College of Computer Science and Elec-tronic Engineering Hunan University He re-

ceived INFOCOM Best Paper in 2010 His re-search interests include communication se-

curity cloud computing and big data

Jie Wu is the chair and a Laura H Carnellprofessor with the Department of Computerand Information Sciences Temple Universi-ty Philadelphia PA Prior to joining TempleUniversity he was a program director withthe National Science Foundation and a dis-tinguished professor with Florida Atlantic U-niversity Boca Raton FL He regularly pub-lishes in scholarly journals conference pro-ceedings and books His research interestsinclude mobile computing and wireless net-

works routing protocols cloud and green computing network trustand security and social network applications He serves on severaleditorial boards including IEEE Transactions on Computers IEEETransactions on Service Computing and Journal of Parallel andDistributed Computing He was general cochairchair for IEEE MASS2006 IEEE IPDPS 2008 and IEEE ICDCS 2013 as well as programcochair for IEEE INFOCOM 2011 and CCF CNCC 2013 he servedas a general chair forACM MobiHoc 2014 He was an IEEE Com-puter Society distinguished visitor ACM distinguished speaker andchair for the IEEE Technical Committee on Distributed Processing(TCDP) He is a CCF distinguished speaker and a fellow of the IEEEHe is the recipient of the 2011 China Computer Federation (CCF)Overseas Outstanding Achievement Award

Siwang Zhou received his BS degree inApplied Physics from Fudan University in1995 He received his PhD degree in com-puter application technology from Hunan U-niversity in 2007 He has been an associateprofessor at the College of Computer Sci-ence and Electronic Engineering Hunan U-niversity His research interests include wire-less sensor networks and wavelet compres-

sive sensing

Page 2: Privacy Preserving Ranked Multi-Keyword

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 2140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 2

category of search function including secure rankedmulti-keyword search fuzzy keyword search andsimilarity search However all these schemes arelimited to the single-owner model As a matter of fact most cloud servers in practice do not just serveone data owner instead they often support multipledata owners to share the benefits brought by cloudcomputing For example to assist the governmentin making a satisfactory policies on health care ser-vice or to help medical institutions conduct usefulresearch some volunteer patients would agree toshare their health data on the cloud To preserve theirprivacy they will encrypt their own health data withtheir secret keys In this scenario only the authorizedorganizations can perform a secure search over thisencrypted data contributed by multiple data ownersSuch a Personal Health Record sharing system wheremultiple data owners are involved can be found atmymedwallcom

Compared with the single-owner scheme devel-oping a full-fledged multi-owner scheme will havemany new challenging problems First in the single-owner scheme the data owner has to stay onlineto generate trapdoors (encrypted keywords) for datausers However when a huge amount of data ownersare involved asking them to stay online simultane-ously to generate trapdoors would seriously affect theflexibility and usability of the search system Secondsince none of us would be willing to share our secretkeys with others different data owners would prefer

to use their own secret keys to encrypt their secretdata Consequently it is very challenging to performa secure convenient and efficient search over thedata encrypted with different secret keys Third whenmultiple data owners are involved we should ensureefficient user enrollment and revocation mechanismsso that our system enjoys excellent security and scal-ability

In this paper we propose PRMSM a privacy p-reserving ranked multi-keyword search protocol ina multi-owner cloud model To enable cloud serversto perform secure search without knowing the actual

value of both keywords and trapdoors we system-atically construct a novel secure search protocol Asa result different data owners use different keys toencrypt their files and keywords Authenticated datausers can issue a query without knowing secret keysof these different data owners To rank the searchresults and preserve the privacy of relevance scores between keywords and files we propose a new ad-ditive order and privacy preserving function familywhich helps the cloud server return the most relevantsearch results to data users without revealing anysensitive information To prevent the attackers from

eavesdropping secret keys and pretending to be legaldata users submitting searches we propose a noveldynamic secret key generation protocol and a new da-ta user authentication protocol As a result attackers

who steal the secret key and perform illegal searcheswould be easily detected Furthermore when we wantto revoke a data user PRMSM ensures efficient datauser revocation Extensive experiments on real-worlddatasets confirm the efficacy and efficiency of ourproposed schemes

The main contributions of this paper are listed asfollows

bull We define a multi-owner model for privacy p-reserving keyword search over encrypted clouddata

bull We propose an efficient data user authenticationprotocol which not only prevents attackers fromeavesdropping secret keys and pretending to beillegal data users performing searches but alsoenables data user authentication and revocation

bull We systematically construct a novel secure searchprotocol which not only enables the cloud server

to perform secure ranked keyword search with-out knowing the actual data of both keywordsand trapdoors but also allows data owners toencrypt keywords with self-chosen keys and al-lows authenticated data users to query withoutknowing these keys

bull We propose an Additive Order and Privacy Pre-serving Function family (AOPPF) which allowsdata owners to protect the privacy of relevancescores using different functions according to theirpreference while still permitting the cloud serverto rank the data files accurately

bull We conduct extensive experiments on real-worlddatasets to confirm the efficacy and efficiency of our proposed schemes

The rest of this paper is organized as followsSection 2 formulates the problem Section 3 presentsthe preliminaries Section 4 demonstrates how to per-form user authentication Section 5 introduces ournovel secure search protocol Section 6 defines AOPPFand illustrates how to use this technique to performprivacy-preserving ranked search Section 7 presentssecurity analysis Section 8 demonstrates the efficiencyof our proposed scheme The related works are re-

viewed in Section 9 In Section 10 we conclude thepaper

2 PROBLEM FORMULATION

In this section we present a formal description for thetarget problem in this paper We first define a systemmodel and a corresponding threat model Then weelucidate the design goals of our solution scheme anda list of notations used in later discussions

21 System ModelIn our multi-owner and multi-user cloud comput-ing model four entities are involved as illustratedin Fig 1 they are data owners the cloud server

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 3140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 3

Data Owners Data Users

Data decryption keys

E n c r y

p t e d

i n d e x e

s

Ranked

results

Semi-trusted

Cloud Server

Administration

Server

D a t a U s e r

A u t h e n t i c a t i o n

D a t a O

w n e r

A u t h e

n t i c a t i o n

Encrypted

Files

Re-encrypted

indexes

S e a r c h r e q u e s t

Trapdoors

Secret

Data

Update

Fig 1 Architecture of privacy preserving keywordsearch in a multi-owner and multi-user cloud model

administration server and data users Data ownershave a collection of files F To enable efficient searchoperations on these files which will be encrypteddata owners first build a secure searchable index I on the keyword set W extracted from F then theysubmit I to the administration server Finally dataowners encrypt their files F and outsource the corre-sponding encrypted files C to the cloud server Uponreceiving I the administration server re-encrypts I for the authenticated data owners and outsourcesthe re-encrypted index to the cloud server Once adata user wants to search t keywords over these

encrypted files stored on the cloud server he firstcomputes the corresponding trapdoors and submitsthem to the administration server Once the datauser is authenticated by the administration serverthe administration server will further re-encrypt thetrapdoors and submit them to the cloud server Uponreceiving the trapdoor T the cloud server searches theencrypted index I of each data owner and returns thecorresponding set of encrypted files To improve thefile retrieval accuracy and save communication costa data user would tell the cloud server a parameterk and cloud server would return the top-k relevantfiles to the data user Once the data user receives thetop-k encrypted files from the cloud server he willdecrypt these returned files Note that how to achievedecryption capabilities are out of the scope of thispaper some excellent work regarding this problemcan be found in [23]

22 Threat Model

In our threat model we assume the administrationserver is trusted The administrative server can beany trusted third party eg the Certificate Authority

in the Public Key Infrastructure the aggregation anddistribution layer in [24] and the third party auditorin [2] Data owners and data users who passed theauthentication of the administration server are also

trusted However the cloud server is not trustedInstead we treat the cloud server as rsquocurious buthonestrsquo which is the same as in previous works [9][10] [11] [13] [19] The cloud server follows our pro-posed protocol but it is eager to obtain the contentsof encrypted files keywords and relevance scoresNote that preserving the access pattern ie the listof returned files is extremely expensive since thealgorithm has to rsquotouchrsquo the whole file set [25] Wedo not aim to protect it in this work for efficiencyconcerns

23 Design Goals and Security Definitions

To enable privacy preserving ranked multi-keywordsearch in the multi-owner and multi-user cloud en-vironment our system design should simultaneouslysatisfy security and performance goals

bull Ranked Multi-keyword Search over Multi-owner The proposed scheme should allowmulti-keyword search over encrypted files whichwould be encrypted with different keys for differ-ent data owners It also needs to allow the cloudserver to rank the search results among differentdata owners and return the top-k results

bull Data owner scalability The proposed schemeshould allow new data owners to enter this sys-tem without affecting other data owners or datausers ie the scheme should support data ownerscalability in a plug-and-play model

bull Data user revocation The proposed schemeshould ensure that only authenticated data userscan perform correct searches Moreover once adata user is revoked he can no longer performcorrect searches over the encrypted cloud data

bull Security Goals The proposed scheme shouldachieve the following security goals 1) KeywordSemantic Security (Definition 1) We will provethat PRMSM achieves semantic security againstthe chosen keyword attack 2) Keyword secrecy(Definition 2) Since the adversary A can knowwhether an encrypted keyword matches a trap-

door we use the weaker security goal (ie secre-cy) that is we should ensure that the probabilityfor the adversary A to infer the actual valueof a keyword is negligibly more than randomlyguessing 3) Relevance score secrecy We shouldensure that the cloud server cannot infer theactual value of the encoded relevance scores

Definition 1 Given a probabilistic polynomial timeadversary A he asks the challenger B for the cipher-text of his submitted keywords for polynomial timesThen A sends two keywords w0 and w1 which are notchallenged before to B B randomly sets micro isin 0 1

and returns an encrypted keyword wmicro to A A contin-ues to ask B for the cipher-text of keyword w the onlyrestriction is that w is not w0 or w1 Finally A outputsits guess microprime for micro We define the advantage that A

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 4140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 4

breaks PRMSM as AdvA=Pr[micro = microprime] minus 1

2

If AdvA isnegligible we say that PRMSM is semantically secureagainst the chosen-keyword attack

Definition 2 Given a probabilistic polynomial timeadversary A he asks the challenger B for the cipher-text of his queried keywords for t times Then B randomly chooses a keyword wlowast encrypts it to wlowastand sends wlowast to A A outputs its guess wprime for wlowastand wins if wprime = wlowast We define the probabilitythat A breaks keyword secrecy as AdvA=Pr[wprime = wlowast]We say that PRMSM achieves keyword secrecy if AdvA= 1

uminust + ε where ϵ is a negligible parameter tdenotes the number of keywords that A has knownand u denotes the size of keyword dictionary

24 Notations

bull O the data owner collection denoted as a set of m data owners O = (O1 O2 Om)

bull F i the plaintext file collection of Oi denoted asa set of n data file F i=(F i1 F i2 F in)

bull C i the ciphertext file collection of F i denoted asC i=(C i1 C i2 C in)

bull W the keyword collection denoted as a set of ukeywords W = (w1 w2 wu)

bull W i Oirsquos encrypted keyword collection of W

denoted as W i = ( wi1 wi2 wiu)

bull W the subset of W which represents queried

keywords denoted as W = (w1 w2 wq)

bull T W the trapdoor for

W denoted as T W =

(T w1 T w2 T wq )bull S ijt the relevance score of tth keyword to jth

file of ith data owner

3 PRELIMINARIES

Before we introduce our detailed construction we first briefly introduce some techniques that will be used inthis paper

31 Bilinear Map

Let G and G1 denote two cyclic groups with a prime

order p We further denote g and g1 as the generatorof G and G1 respectively Let e be a bilinear mape GtimesG rarr G1 then the following three conditions aresatisfied 1) Bilinear foralla b isin Z

lowast p e(ga gb) = e(g g)ab 2)

Non-degenerate e(g g) = 1 3) Computable e can beefficiently computed

32 Bilinear Diffie-Hellman Problem and BilinearDiffie-Hellman Assumption

The Bilinear Diffie-Hellman (BDH) problem in(G G1 e) is described as follows given random g isin

G and ga gb gc for some abc isin Zlowast p computee(g g)abc isin G1

The BDH assumption is presented as follows given(G G1 e) g isin G and ga gb gc for some ab c isin Z

lowast p

an adversary A has advantage ϵ in solving BDHwhen P r[A(ga gb gc) = e(g g)abc] ge ϵ The BDHassumption tells that the advantage ϵ is negligible forany polynomial time A

4 DATA USE R AUTHENTICATION

To prevent attackers from pretending to be legal datausers performing searches and launching statisticalattacks based on the search result data users must be authenticated before the administration server re-encrypts trapdoors for data users Traditional authen-tication methods often follow three steps First datarequester and data authenticator share a secret keysay k0 Second the requester encrypts his personallyidentifiable information d0 using k0 and sends the en-crypted data (d0)k0 to the authenticator Third the au-thenticator decrypts the received data with k0 and au-thenticates the decrypted data However this method

has two main drawbacks First since the secret keyshared between the requester and the authenticatorremains unchanged it is easy to incur replay attackSecond once the secret key is revealed to attackersthe authenticator cannot distinguish between the legalrequester and the attackers the attackers can pretendto be legal requesters without being detected

In this section we first give an overview of thedata user authentication protocol Then we introducehow to achieve secure and efficient data user authen-tication Finally we demonstrate how to detect illegalsearches and how to enable secure and efficient datauser revocation

41 Overview

Now we give an example to illustrate the main idea of the user authentication protocol(the detailed protocolis elaborated in the following subsections) AssumeAlice wants to be authenticated by the administrationserver so she starts a conversation with the serverThe server then authenticates the contents of theconversation If the contents are authenticated bothAlice and the server will generate the initial secret key

according to the conversation contents After the ini-tialization to be authenticated successfully Alice hasto provide the historical data of their conversationsIf the authentication is successful both Alice and theadministration server will change their secret keys ac-cording the contents of the conversation In this waythe secret keys keep changing dynamically withoutknowing the correct historical data an attacker cannotstart a successful conversation with the administrationserver

42 User AuthenticationBefore we introduce the dynamic key generationmethod and the authentication protocol we first intro-duce the format of the authentication data As shown

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 5140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 5

Fig 2 Format of Authentication Data

in Fig 2 the authentication data consists of five

parts The request counter field records the number of search requests that the data user has submitted Thelast request time field asks the data user to providethe historical data of his previous request time Thepersonally identifiable data (eg passport numbertelephone number) field is used to identify a specificdata user while the random number and CRC fieldare further used to check whether the authenticationdata has been tampered with

The key point of a successful authentication is toprovide both the dynamically changing secret keysand the historical data of the corresponding data

user Let kij denotes the secret key shared betweenadministration server and the jth data user U j afteri instances of search requests and dij denotes theauthentication data for the (i+1)th request of U j Ourauthentication protocol runs in the following six steps

A Data user U j prepares his authentication datadij ie U j needs to fill in all the fields of authentica-tion data based on his historical data

B Data user U j encrypts dij with the current secretkey kij and submits the encrypted authenticationdata (dij)kij to the administration server

C After submitting the authentication data thedata user U j generates another secret key ki+1j =kij oplus H (dij) and stores both kij and ki+1j

D Upon receiving U j rsquos encrypted authenticationdata the administration server decrypts it with kij

E The administration server checks the requestcounter last request time personally identifiable dataand CRC respectively If the authentication succeedsthe administration server first generates a new secretkey ki+1j = kij oplus H (dij) then he replies a confirma-tion data di+1j and encrypts it with ki+1j Otherwisethe administration server encrypts di+1j with secret

key kij F Upon receiving a reply from the administrationserver the data user U j will try to decrypt it withki+1j If the decrypted data contains the confirmationdata the authentication is successful Otherwise theauthentication is regarded as being unsuccessful Thedata user deletes the new generated secret key ki+1j

and considers whether to start another authenticationFig 3 shows an example of successful authen-

tication between the administration server and thedata user As we can see after each successful au-thentication process the secret key will be changed

dynamically according to the previous key and somehistorical data Therefore once an attacker steals a se-cret key he can hardly get any benefits On one handif the attacker knows nothing about the historical data

Administration

Server

Data

User U i

0

0i

ik

d

1

1i

ik

d

2

2i

ik

d

33

ii

k d

0ik

1 0 0i i ik k H d

2 1 1i i ik k H d

3 2 2i i ik k H d

4 3 3i i ik k H d

0ik

1 0 0i i ik k H d

2 1 1i i ik k H d

3 2 2i i ik k H d

4 3 3i i ik k H d

Fig 3 Example of data user authentication and dy-namic secret key generation

of the legal data user he cannot even construct a legalauthentication data On the other hand if the legaldata user performs another successful authentication

the previous secret key will be expired

43 Illegal Search Detection

In our scheme the authentication process is protected by the dynamic secret key and the historical infor-mation We assume that an attacker has successfullyeavesdropped the secret key k0j of U j Then he hasto construct the authentication data if the attackerhas not successfully eavesdropped the historical da-ta eg the request counter the last request timehe cannot construct the correct authentication dataTherefore this illegal action will soon be detected bythe administration server Further if the attacker hassuccessfully eavesdropped all data of U j the attackercan correctly construct the authentication data andpretend himself to be U j without being detected bythe administration server However once the legaldata user U j performs his search since the secret keyon the administration server side has changed therewill be contradictory secret keys between the admin-istration server and the legal data user Therefore thedata user and administration server will soon detectthis illegal action

44 Data User Revocation

Different from previous works data user revocationin our scheme does not need to re-encrypt and updatelarge amounts of data stored on the cloud server In-stead the adminstration server only needs to updatethe secret data S a stored on the cloud server As will be detailed in the next section S a = gka1middotka2middotra whereka1 and ka2 are the secret keys of the administrationserver and ra is randomly generated for every updateoperation Consequently the previous trapdoors will be expired Additionally without the help of the

administration server the revoked data user cannotgenerate the correct trapdoor T whprime Therefore a datauser cannot perform correct searches once he is re-voked

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 6140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 6

5 MATCHING DIFFERENT-KE Y ENCRYPTED

KEYWORDS

Numerous data owners are often involved in practicalcloud applications For privacy concerns they would be reluctant to share secret keys with others Insteadthey prefer to use their own secret keys to encrypt

their sensitive data (keywords files) When keywordsof different data owners are encrypted with differentsecret keys the coming question is how to locatedifferent-key encrypted keywords among multiple da-ta owners In this section to enable secure efficientand convenient searches over encrypted cloud dataowned by multiple data owners we systematicallydesign schemes to achieve the following three require-ments First different data owners use different secretkeys to encrypt their keywords Second authenticateddata users can generate their trapdoors without know-ing these secret keys Third upon receiving trapdoors

the cloud server can find the corresponding keyword-s from different data ownersrsquo encrypted keywordswithout knowing the actual value of keywords ortrapdoors

51 Overview

Now we present an example to illustrate the mainidea of our keywords matching protocol(the detailedprotocol is elaborated in the following subsections)Assume Alice wants to use the cloud to store her fileF she first encrypts her file F and gets the ciphertextC To enable other users to perform secure searcheson C Alice extracts a keyword wih and sends theencrypted keyword wih = (E aprime E o) to the adminis-tration server The administration server further re-encrypts E aprime to E a and submits wih = (E a E o) to thecloud server Now Bob wants to search a keyword whprime he first generates the trapdoor T primewhprime and submits it tothe administration server The administration serverre-encrypts T primewhprime to T whprime = (T 1 T 2 T 3) generates asecret data S a and submits T whprime S a to the cloudserver The cloud server will judge whether Bobrsquossearch request matches Alicersquos encrypted keyword by

checking whether e (E a

T 3

) = e (E o

T 1

) middot e (S a

T 2

)holds

52 Construction Initialization

Our construction is based on the aforementioned bi-linear map Let g and g1 denote the generator of twocyclic groups G and G1 with order p Let e be a bilinear map e G times G rarr G1 Given different secretparameters as input a randomized key generationalgorithm will output the private keys used in thesystem ka1 isin Z

+ p ka2 isin Z

+ p kif isin Z

+ p kiw isin Z

+ p larr

(0 1)lowast where ka1 and ka2 are the private keys of the

administration server kiw and kif are the privatekeys used to encrypt keywords and files of data ownerOi respectively Let H (middot) be a public hash function itsoutput locates in Z

+ p

53 Keyword Encryption

For keyword encryption the following conditionsshould be satisfied first different data owners usetheir own secret keys to encrypt keywords Secondfor the same keyword it would be encrypted to dif-ferent cipher-texts each time These properties benefit

our scheme for two reasons First losing the key of one data owner would not lead to the disclosure of other ownersrsquo data Second the cloud server cannotsee any relationship among encrypted keywords Giv-en the hth keyword of data owner Oi ie wih weencrypt wih as follows

wih =983080

gkiwmiddotromiddotH (wih) gkiwmiddotro983081

(1)

where ro is a randomly generated number eachtime which helps enhance the security of wih Foreasy description and understanding we let E primea =

gkiwmiddotromiddotH (wih)

and E o = gkiwmiddotro

The data owner delivers E aprime and E o to the admin-

istration server and the administration server furtherre-encrypts E aprime with his secret keys ka1 and ka2 andgets E a

E a =852008

E aprime middot gka1852009ka2 (2)

Therefore wih = (E a E o) The administrative serv-er further submits wih to the cloud server Note thatsince the administration server only does simple com-putations on the encrypted data he cannot learn anysensitive information from these random encrypted

data without knowing the secret keys of data owners

54 Trapdoor Generation

To make the data users generate trapdoors securelyconveniently and efficiently our proposed schemeshould satisfy two main conditions First the datauser does not need to ask a large amount of dataowners for secret keys to generate trapdoors Sec-ond for the same keyword the trapdoor generatedeach time should be different To meet this conditionthe trapdoor generation is conducted in two steps

First the data user generates trapdoors based on hissearch keyword and a random number Second theadministration server re-encrypts the trapdoors forthe authenticated data user

Assume a data user wants to search keyword whprime so he encrypts it as follows

T primewhprime =983080

gH (whprime )middotru gru983081

(3)

where ru is a randomly generated number eachtime As we can see during the trapdoor generationprocess secret keys of data owners are not required

Additionally with the help of random variable ru forthe same keyword whprime we can generate two differenttrapdoors which prevent attackers from knowing therelationship among trapdoors

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 7140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 7

Upon receiving T primewhprime the administration server firstgenerates a random number ra and then re-encryptsT primewhprime as follows

T whprime =983080

gH (whprime )middotrumiddotka1middotka2middotra grumiddotka1 grumiddotka1middotra983081

(4)

For easy description and understanding we let

T 1 = g

H (whprime )middotrumiddotka1middotka2middotra

T 2 = g

rumiddotka1

T 3 = g

rumiddotka1middotra

hence T whprime = (T 1 T 2 T 3) Finally the administrationserver submits T whprime to the cloud server

55 Keywords Matching among Different DataOwners

The cloud server stores all encrypted files and key-words of different data owners The administrationserver will also store a secret data S a = gka1middotka2middotra

on the cloud server Upon receiving a query requestthe cloud will search over the data of all these dataowners The cloud processes the search request in two

steps First the cloud matches the queried keywordsfrom all keywords stored on it and it gets a candidatefile set Second the cloud ranks files in the candidatefile set and finds the most top-k relevant files Weintroduce the matching strategy here while leavingthe task of introducing the ranking strategy in the nextsection When the cloud obtains the trapdoor T whprime andencrypted keywords (E o E a) he first computes

e (S a T 2)

= e852008

gramiddotka1middotka2 grumiddotka1852009

(5)

= e(g g)ramiddotka1middotka2middotrumiddotka1

Then he can judge whether whprime = wih (ie anencrypted keyword is located) holds if the followingequation is true

e (E a T 3)

= e

1048616983080gkiwmiddotromiddotH (wih) middot gka1

983081ka2 grumiddotka1middotra

1048617= e(g g)

(kiw middotromiddotH (wih)+ka1)middotka2middotrumiddotka1middotra (6)

= e(g g)kiwmiddotromiddotH (wih)middotka2middotrumiddotka1middotra middot e (S a T 2)

= e983080

gkiwmiddotro gH (wih)middotka2middotrumiddotka1middotra983081

middot e (S a T 2)

= e (E o T 1) middot e (S a T 2)

6 PRIVACY PRESERVING RANKED SEARCH

The aforementioned section helps the cloud matchthe queried keywords and obtain a candidate fileset However we cannot simply return undifferentialfiles to data users for the following two reasons Firstreturning all candidate files would cause abundantcommunication overhead for the whole system Sec-ond data users would only concern the top-k relevantfiles corresponding to their queries In this sectionwe first elucidate an order and privacy preserving

encoding scheme Then we illustrate an additive orderpreserving and privacy preserving encoding schemeFinally we apply the proposed scheme to encode therelevance scores and obtain the top-k search results

x 1 2 3 4 5

f ( x) 100-1000 1100-1800 2000-4200 4300-5000 5100-7000

Fig 4 An example of Order Preserving and PrivacyPreserving Function

61 Order and Privacy Preserving Function

To rank the relevance score while preserving its priva-cy the proposed function should satisfy the followingconditions 1) This function should preserve the orderof data as this helps the cloud server determine whichfile is more relevant to a certain keyword according tothe encoded relevance scores 2) This function shouldnot be revealed by the cloud server so that cloudserver can make comparisons on encoded relevancescores without knowing their actual values 3) Dif-ferent data owners should have different functionssuch that revealing the encoded value of a data ownerwould not lead to the leakage of encoded values of other data owners In order to satisfy condition 1we introduce a data processing part m(x middot) whichpreserves the order of x To satisfy condition 2 weintroduce a disturbing part rf which helps prevent thecloud server from revealing this function To satisfycondition 3 we use m(x middot) to process the ID of dataowners So this function belongs to the followingfunction family

F yoppf (x) =991761

0lejkleτ

Ajk middot m(x j) middot m(y k) + rf (7)

where τ denotes the degree of F yoppf (x) and Ajkdenotes the coefficients of m(x j) middot m(y k) We furtherdefine m(x j) as follows m(x 0) = 1 m(x 1) = xand m(x j) = (m(x jminus1)+α middot x) middot (1+λ) if jgt1 whereα and λ are two constant numbers

Now we introduce how to set the disturbing partrf Since

sum0lejkleτ Ajkmiddot(m(x+1 j)minusm(x j))middotm(y k)gesum

0lejkleτ Ajkmiddot((1+λ)jminus1

+ αmiddotsum

1leilejminus2(1+λ))

Let l be an integer such that2lminus1le

sum0lejkleτ Ajkmiddot((1+λ)

jminus1+αmiddot

sum1leilejminus2(1+λ))le2l

we can set rf isin (0 2lminus1)Obviously forallx1gtx2 we have F yoppf (x1)gtF yoppf (x2)

62 Additive Order and Privacy Preserving Func-tion

With the order and privacy preserving function wehave if x1 gt x2 then y1 gt y2 However the additionof the function is not necessarily order preserving

An example is presented in Fig 4 Obviously f (x)

is order preserving to preserve privacy a disturbingpart is introduced As we can see f (1) + f (5) variesfrom 5200 to 8000 f (1) + f (4) varies from 4400 to6000 f (2) + f (3) varies from 3100 to 6000 Though

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 8140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 8

1 + 5 gt 1 + 4 and 1 + 5 gt 2 + 3 it is probable forf (1)+f (5) lt f (1)+f (4) and f (1)+f (5) lt f (2)+f (3)

To correctly perform the secure ranked multi-keyword search the sum of any two encoded rel-evance scores should still be ordered and privacypreserved (we use the sum of encoded relevance scoreto evaluate the relevance between a file and multiplekeywords in this paper) To satisfy this conditionwe further design an additive order and privacypreserving function family based on Eq 7

F yaoppf (x) =991761

0lejkleτ

Ajk middot m(x j) middot m(y k) + raof (8)

Where τ denotes the degree of F yaoppf (x) and Ajkdenotes the coefficients of m(x j) middot m(y k)

Before we take a step further towards the useof F yaoppf (x) we will first introduce our auxiliarytheorem ie how to make an order and privacypreserving function to be additive order and privacypreserving

Definition 3 Given an order preserving functiony = f (x) we define ∆f (xi) = f (xi+1) minus f (xi) and∆f (xi) = ∆f (xi+1) minus ∆f (xi) To preserve privacy wedefine yi = f (xi) + ri where ri is a random numberand ri gt 0 We define the max value of ri as rimaxand rimax = f (xi)minusimiddot∆(xi) hence yi can change fromf (xi) to f (xi) + rimax

Theorem 1 Given an order preserving function yi =f (xi) + ri when the following three conditionsare satisfied (1) foralli ∆f (xi + 1) le ∆f (xi) (2) foralli

˜∆f (xi + 1) ge ˜

∆f (xi) (3) foralli ri lt 983080i

2

middot ˜∆f (xi)983081 2The order and privacy preserving function yi is also

an additive order and privacy preserving that isfor any

sumxiisinD1

xi le sum

xjisinD2xj where D1 and D2

denotes two subsets of the definition domain we havesumxiisinD1

yi lesum

xjisinD2yj Proof The proof is elaborated in Appendix A

63 Encoding relevance scores

In our paper we use a well-known method to com-pute the relevance score [26] ie Score(w F d) =1

|F d|(1+ln f dw)ln(1+ N

f w) where w denotes the given

keyword |F d| is the length of file F d f dw denotes theT F of w in file F d f w denotes the number of filescontaining w and N denotes the total number of filesin the collection

Now we introduce how to encode the relevancescore with an additive order and privacy preservingfunction in our additive order and privacy preservingfunction family F yaoppf (x) Given input S ijt (the rel-evance score of tth keyword to jth file of ith dataowner) Data owners can independently choose afunction from this family to protect the privacy of their relevance scores For simplicity in this paper we

specify data owner i to choose F H i(i)aoppf (middot) and defineV ijt as the encoded data of S ijt

V ijt = F H i(i)aoppf (S ijt) (9)

where H i(middot) is a public known hash function and i isthe I D of data owner Oi

With the well-designed properties of F aoppf thecloud server can make a comparison among encodedrelevance scores for the same data owner Howeversince different data owners encode their relevancescores with different functions in F aoppf the cloudserver cannot make a comparison between encodedrelevance scores for different data owners To solvethis problem we define

T ijt(y) = F yaoppf (S ijt) (10)

where T ijt(y) is used to help the cloud servermake comparisons among relevance scores encoded by different data owners and y is a variable whichtakes the hash value of data ownerrsquos ID as input

Finally we attach each V ijt with a T ijt(y)

64 Ranking search results

In this paper we use the sum of the relevance scoresas the metric to rank search results Now we intro-duce the strategies of ranking search results based onthe encoded relevance scores First the cloud com-putes V ij =

sumtisinW

V ijt the sum of encoded relevance

scores between the jth file and matched keywords forOi and the auxiliary value T ij(y) =

sumtisinW

T ijt(y) Then

the cloud ranks the sum of encoded relevance scorewith the following two conditions

(1) Two encoded data belong to the same dataowner Given that a data user issues a query W =wm wn we assume that Oirsquos F 1 and F 2 satisfy thequery Then the cloud adds the encoded relevancescore together and gets the relevance score of Oiprimes F 1to W

V i1 = V i1m + V i1n (11)

Similarly

V i2 = V i2m + V i2n (12)

If V i1 ge V i2 then F 1 is more relevant to the queried

keyword sets W otherwise F 2 is more relevant to W (2) Two encoded data belong to two different data

owners Given that a data user issues a query W =wm wn we assume Oirsquos F 1 and Oj rsquos F 2 satisfy thequery The cloud server makes comparison on theirencoded relevance scores in the following four steps

First the cloud server computes the relevance score

of Oiprimes F 1 to W

V i1 = V i1m + V i1n (13)

Second the cloud server computes the T j2(y)

T j2(y) = T j2m(y) + T j2n(y) (14)

Third the cloud server substitutes H i(i) for thevariable y and gets T j2(H (i))

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 9140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 9

C 11

C 12

C 21

C 23

O1 O2

H (1)

Fig 5 Example of ranking search results

Finally the cloud draws the conclusion if V i1 ge

T j2(H i(i)) then Oiprimes F 1 is more relevant to W thanOj primes F 2 otherwise Oj primes F 2 is more relevant to W

Now the cloud server can make comparisons onencoded relevance scores Thus it is easy for the cloudto return the top-k relevant files to the data user

Fig 5 shows an example of making comparisons onthe encoded relevance scores As we can see giventhe trapdoor T w1 the cloud server finds that w11 andw21 match T w1 Data owner O1 has two encryptedfiles C 11 and C 12 that contain w11 Data ownerO2 also has two encrypted files C 21 and C 23 thatcontain w21 For O1 he compares V 111 with V 121 if V 111 gt V 121 the cloud server regards C 11 as beingmore relevant to T w1 otherwise C 21 is more relevantto T w1 We assume C 12 is more relevant to T w1 for O1

and C 21 is more relevant for O2 then the cloud makescomparisons between C 12 and C 21 As shown in thefigure the cloud first substitutes H (1) to T 211(y) andgets T 211(H (1)) then he compares T 211(H (1)) withV 121 if V 121 gt T 211(H (1)) Then O1rsquos encrypted fileC 12 is more relevant to T w1 otherwise O2rsquos encryptedfile C 21 is more relevant to the search request

7 SECURITY ANALYSIS

In this section we provide step-by-step security anal-yses to demonstrate that the security requirementshave been satisfied for the data files the keywordsthe queries and the relevance scores

71 Data Files

The data files are protected by symmetric encryption before upload As long as the encryption algorithm isnot breakable the cloud server cannot know the data

72 Keywords

We formulate the security goals achieved by PRMSMwith the following two theorems

Theorem 2 Given the DDH (Decisional DiffieHell-man) assumption PRMSM is semantically secure a-gainst the chosen keyword attack under the selectivesecurity modelProof See Appendix B

Theorem 3 Given the DL (Discrete Logarithm) as-sumption PRMSM achieves keyword secrecy in therandom oracle modelProof See Appendix C

73 Trapdoors

Recall the trapdoor construction formula

T whprime =983080

gH (whprime )middotrumiddotka1middotka2middotra grumiddotka1 grumiddotka1middotra983081

(15)

If the cloud server wants to know the actual valueof the trapdoor and distinguish two trapdoors it has

to solve the discrete logarithm problem in Z p withlarge prime p therefore the privacy of trapdoor isprotected as long as the discrete logarithm problem ishard

74 Relevance Scores

In our scheme relevance scores are encoded with twoAdditive Order and Privacy Preserving FunctionsNow we analyze the security of additive order andprivacy preserving functions Assume the input for

the F yaopp

(x) is

s and the data owner ID is

i Thenthe cloud server can only capture the following value

that is derived from s

F (H i(i))aopp (s) =991761

0lejkleτ

Ajk middotm(s j)middotm(H i(i) k)+raof

(16)Assume the cloud server has collected n encod-

ed relevance scores for the same relevance score s(F

(H i(i))aopp (s)) then he can construct n equations How-

ever these functions have n + 1 unknown variablesTherefore it is infeasible for the cloud server to breakthe additive order and privacy preserving F yaopp(x)

Thus the security of the corresponding F yaopp(x) en-coded relevance score is also preserved

8 PERFORMANCE EVALUATION

In this section we measure the efficiency of PRMSMand compare it with its previous version SecureRanked Multi-keyword Search for Multiple data own-ers in cloud computing (SRMSM) [27] and the state-of-the-art privacy-preserving Multi-keyword RankedSearch over Encrypted cloud data (MRSE) [11] side

by side Since MRSE is only suitable for the singleowner model our PRMSM and SRMSM not onlywork well in multi-owner settings but also outper-form MRSE on many aspects

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 10140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 10

Number of files in the dataset (times103)

T i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(a) For different size datasetswith the same dictionary u =4000

Number of keywords in dictionary(times103) T

i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(b) For the same dataset withdifferent dictionary sizes n =1000

T i m e o f E n c o d i n g y ( s )

Number of keywords in dictionary (times103)

(c) Time cost of encoding rel-evance score (a subroutine of

building index)

Fig 6 Time cost of index construction

Number of keywords in dictionary(times103) T i m e o f g e n e r a t i n g t r a p d o o r ( s )

(a) For different size of key-word dictionary with the samenumber of queried keywordsq = 100

Number of queried keywords (times102) T i m e o f g e

n e r a t i n g t r a p d o o r ( times 1 0 - 2 s )

(b) For different number of queried keywords with thesame size of keyword dictio-nary u = 4000

Fig 7 Time cost of generating trapdoors

81 Evaluation Settings

We conduct performance experiments on a real data

set the Internet Request For Comments dataset (RFC)[28] We use Hermetic Word Frequency Counter [29]to extract keywords from each RFC file After thekeyword extraction we compute keyword statisticssuch as the keyword frequency in each file the lengthof each file the number of files containing a specifickeyword etc We further calculate the relevance scoreof a keyword to a file based on these statistics Thefile size and keyword frequency of this data set can be seen in [20]

The experiment programs are coded using thePython programming language on a PC with 22GHZ

Intel Core CPU and 2GB memory We implementall necessary routines for data owners to preprocessdata files for the data user to generate trapdoorsfor the administrative server to re-encrypt keywordstrapdoors and for the cloud server to perform rankedsearches We use the Weil pairing [30] to construct our bilinear map

82 Evaluation Results

821 Index Construction

Fig 6(a) shows that given the same keyword dictio-

nary (u=4000) time of index construction for theseschemes increases linearly with an increasing numberof files while SRMSM and PRMSM spend much lesstime on index construction Fig 6(b) demonstrates

that given the same number of files (n=1000) SRMSMand PRMSM consume much less time than MRSE

on constructing indexes Additionally SRMSM andPRMSM are insensitive to the size of the keyworddictionary for index construction while MRSE suf-fers a quadratic growth with the size of keyworddictionary increases Fig 6(c) shows the encodingefficiency of our proposed AOPPF The time spent onencoding increases from 01s to 1s when the numberof keywords increases from 1000 to 10000 This timecost can be acceptable

822 Trapdoor Generation

Compared with index construction trapdoor genera-

tion consumes relatively less time Fig 7(a) demon-strates that given the same number of queried key-words (q=100) SRMSM and PRMSM are insensitive tothe size of keyword dictionary on trapdoor generationand consumes 0026s and 0031s respectively Mean-while MRSE increases from 004s to 62s Fig 7(b)shows that given the same number of dictionary size(u=4000) when the number of queried keywords in-creases from 100 to 1000 the trapdoor generation timefor MRSE is 031s and remains unchanged WhileSRMSM increases from 0024s to 025s PRMSM in-creases from 0031s to 031s We observe that PRMSM

spends a little more time than SRMSM on trapdoorgeneration the reason is that PRMSM introduces anadditional variable to ensure the randomness of trap-doors

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 11140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 11

T

i m e c o s t o f r e - e n c r y p t i o n ( s )

Average number of keywords per owner

(a) Time cost of re-encryptingkeywords

Average number of trapdoors per user

T i m e c o s t o f r e - e n c r y p t i o n ( s )

(b) Time cost of re-encryptingtrapdoors

Fig 8 Time cost of the administration server

Number of queried keywords

T i m e o f q u e r y ( s )

(a) For different size of queriedkeywords with the same sizeof data set n=2000 and key-word dictionary u = 100

Number of files in the dataset (times103)

T i m e o f q u e r y ( s )

(b) For different size of data setwith the same size of queriedkeywords q=5 and keyworddictionary u = 100

Number of keywords in dictionary (times102

T i m e o f q u e r y ( s )

(c) For different size of key-word dictionary with the samesize of queried keywords q=5and data set n = 2000

Fig 9 Time cost of search

823 Re-encryption by the administration server

Fig 8(a) illustrates the re-encryption time cost of the

administration server in PRMSM As we can see forthe same average number of keywords per owner themore data owners are involved the more time is spenton re-encryption When there are 300 data ownerseach data owner has 100 keywords we need 38s tore-encrypt these keywords which is acceptable

Fig 8(b) demonstrates the time cost of re-encryptingtrapdoors We observe that for the same averagenumber of trapdoors per user the more data usersthat submit trapdoors the more time would be spenton re-encryption When there are 1000 data userswho concurrently submit data each data user has 10

trapdoors we only need 334s for re-encryption

824 Search

From Fig 9 we observe that PRMSM spends moretime for searching The fundamental reason is thatthe pairing operation used in PRMSM needs moretime As we can see from Fig 9(a) and Fig 9(c)the more keywords existing in the cloud server themore time is required for pairing operation Fig 9(b)confirms that when the number of keywords stored onthe cloud server remains a constant PRMSM will not

increase even if the number of files increases ThoughPRMSM spends relatively more time this observationalso confirms that the searching operation should beoutsourced to the cloud server

9 RELATED WOR K

In this section we review three categories of worksearchable encryption secure keyword search in cloudcomputing and order preserving encryption

91 Searchable Encryption

The earliest attempt of searchable encryption wasmade by Song et al In [3] they propose to encrypteach word in a file independently and allow the serverto find whether a single queried keyword is containedin the file without knowing the exact word Thisproposal is more of theoretic interests because of highcomputational costs Goh et al propose building akeyword index for each file and using Bloom filter

to accelerate the search [4] Curtmola et al propose building indices for each keyword and use hashtables as an alternative approach to searchable en-cryption [5] The first public key scheme for keywordsearch over encrypted data is presented in [6] [7] and[8] further enrich the search functionalities of search-able encryption by proposing schemes for conjunctivekeyword search

The searchable encryption cares mostly about singlekeyword search or boolean keyword search Extend-ing these techniques for ranked multi-keyword searchwill incur heavy computation and storage costs

92 Secure Keyword Search in Cloud Computing

The privacy concerns in cloud computing motivatethe study on secure keyword search Wang et al

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 12140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 12

first defined and solved the secure ranked keywordsearch over encrypted cloud data In [9] and [18] theyproposed a scheme that returns the top-k relevant filesupon a single keyword search Cao et al [10] [11]and Sun et al [31] [12] extended the secure keywordsearch for multi-keyword queries Their approachesvectorize the list of keywords and apply matrix mul-tiplications to hide the actual keyword informationfrom the cloud server while still allowing the serverto find out the top-k relevant data files Xu et alproposed MKQE (Multi-Keyword ranked Query onEncrypted data) that enables a dynamic keyword dic-tionary and avoids the ranking order being distorted by several high frequency keywords [13] Li et al[14] Chuah et al [15] Xu et al [16] and Wang et al[17] proposed fuzzy keyword search over encryptedcloud data aiming at tolerance of both minor typosand format inconsistencies for usersrsquo search input [19]further proposed privacy-assured similarity searchmechanisms over outsourced cloud data In [20] weproposed a secure efficient and distributed keywordsearch protocol in the geo-distributed cloud environ-ment

The system model of these previous works onlyconsider one data owner which implies that in theirsolutions the data owner and data users can easilycommunicate and exchange secret information Whennumerous data owners are involved in the systemsecret information exchanging will cause considerablecommunication overhead Sun et al [21] and Zheng

et al [22] proposed secure attribute-based keywordsearch schemes in the challenging scenario wheremultiple owners are involved However applying CP-ABE in the cloud system would introduce problemsfor data user revocation ie the cloud has to updatethe large amount of data stored on it for a data userrevocation [32] Additionally they do not supportprivacy preserving ranked multi-keyword search Ourpaper differs from previous studies regarding the em-phasis of multiple data owners in the system modelThis paper seeks a solution scheme to maximally relaxthe requirements for data owners and users so that

the scheme could be suitable for a large number of cloud computing users

93 Order Preserving Encryption

The order preserving encryption is used to preventthe cloud server from knowing the exact relevancescores of keywords to a data file The early work of Agrawal et al proposed an Order Preserving sym-metric Encryption (OPE) scheme where the numericalorder of plain texts are preserved [33] Boldyreva etal further introduced a modular order preserving

encryption in [34] Yi et al [35] proposed an order p-reserving function to encode data in sensor networksPopa et al [36] recently proposed an ideal-secureorder-preserving encryption scheme Kerschbaum et

al [37] further proposed a scheme which is not onlyidea-secure but is also an efficient order-preservingencryption scheme However these schemes are notadditive order preserving As a complementary workto the previous order preserving work we propose anew additive order and privacy preserving functions(AOPPF) Data owners can freely choose any functionfrom an AOPPF family to encode their relevancescores The cloud server computes the sum of encodedrelevance scores and ranks them based on the sum

10 CONCLUSIONS

In this paper we explore the problem of securemulti-keyword search for multiple data owners andmultiple data users in the cloud computing envi-ronment Different from prior works our schemesenable authenticated data users to achieve secureconvenient and efficient searches over multiple dataownersrsquo data To efficiently authenticate data usersand detect attackers who steal the secret key andperform illegal searches we propose a novel dynamicsecret key generation protocol and a new data userauthentication protocol To enable the cloud server toperform secure search among multiple ownersrsquo dataencrypted with different secret keys we systematical-ly construct a novel secure search protocol To rank thesearch results and preserve the privacy of relevancescores between keywords and files we propose anovel Additive Order and Privacy Preserving Func-tion family Moreover we show that our approachis computationally efficient even for large data andkeyword sets As our future work on one hand wewill consider the problem of secure fuzzy keywordsearch in a multi-owner paradigm On the other handwe plan to implement our scheme on the commercialclouds

ACKNOWLEDGMENTS

This work is supported in part by the National Natu-ral Science Foundation of China (Project No 6117303861472125 61300217) and the China Scholarship Coun-cil A preliminary version of this paper appearedin the Proceedings of the IEEEIFIP InternationalConference on Dependable Systems and Networks(DSNrsquo14) [27]

REFERENCES

[1] M Armbrust A Fox R Griffith A D Joseph R KatzA Konwinski G Lee D Patterson A Rabkin I Stoica andM Zaharia ldquoA view of cloud computingrdquo Communication of the ACM vol 53 no 4 pp 50ndash58 2010

[2] C Wang S S Chow Q Wang K Ren and W Lou ldquoPrivacy-preserving public auditing for secure cloud storagerdquo Comput-

ers IEEE Transactions on vol 62 no 2 pp 362ndash375 2013[3] DSong DWagner and APerrig ldquoPractical techniques for

searches on encrypted datardquo in Proc IEEE International Sym- posium on Security and Privacy (SampPrsquo00) Nagoya Japan Jan2000 pp 44ndash55

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 13140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 13

[4] E Goh (2003) Secure indexes [Online] Availablehttpeprintiacrorg

[5] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proc ACM CCSrsquo06 VA USA Oct 2006 pp79ndash88

[6] D B et al ldquoPublic key encryption with keyword search secureagainst keyword guessing attacks without random oraclerdquoEUROCRYPT vol 43 pp 506ndash522 2004

[7] P Golle J Staddon and B Waters ldquoSecure conjunctive key-word search over encrypted datardquo in Proc Applied Cryptogra-

phy and Network Security (ACNSrsquo04) Yellow Mountain China Jun 2004 pp 31ndash45

[8] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo in ProcInformation and Communications Security (ICICSrsquo05) BeijingChina Dec 2005 pp 414ndash426

[9] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo in Proc IEEEDistributed Computing Systems (ICDCSrsquo10) Genoa Italy Jun2010 pp 253ndash262

[10] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo in Proc IEEE INFOCOMrsquo11 Shanghai China Apr 2011pp 829ndash837

[11] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo Parallel and Distributed Systems IEEE Transactions onvol 25 no 1 pp 222ndash233 2014

[12] W Sun B Wang N Cao M Li W Lou Y T Hou and H LildquoVerifiable privacy-preserving multi-keyword text search inthe cloud supporting similarity-based rankingrdquo Parallel andDistributed Systems IEEE Transactions on vol 25 no 11 pp3025ndash3035 2014

[13] Z Xu W Kang R Li K Yow and C Xu ldquoEfficient multi-keyword ranked query on encrypted data in the cloudrdquo inProc IEEE Parallel and Distributed Systems (ICPADSrsquo12) Singa-pore Dec 2012 pp 244ndash251

[14] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo in

Proc IEEE INFOCOMrsquo10 San Diego CA Mar 2010 pp 1ndash5[15] M Chuah and W Hu ldquoPrivacy-aware bedtree based solutionfor fuzzy multi-keyword search over encrypted datardquo in ProcIEEE 31th International Conference on Distributed ComputingSystems (ICDCSrsquo11) Minneapolis MN Jun 2011 pp 383ndash392

[16] P Xu H Jin Q Wu and W Wang ldquoPublic-key encryptionwith fuzzy keyword search A provably secure scheme underkeyword guessing attackrdquo Computers IEEE Transactions onvol 62 no 11 pp 2266ndash2277 2013

[17] B Wang S Yu W Lou and Y T Hou ldquoPrivacy-preservingmulti-keyword fuzzy search over encrypted data in thecloudrdquo in IEEE INFOCOM Toronto Canada May 2014 pp2112ndash2120

[18] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoParallel and Distributed Systems IEEE Transactions on vol 23

no 8 pp 1467ndash1479 2012[19] C Wang K Ren S Yu and K M R Urs ldquoAchieving usableand privacy-assured similarity search over outsourced clouddatardquo in Proc IEEE INFOCOMrsquo12 Orlando FL Mar 2012 pp451ndash459

[20] W Zhang Y Lin S Xiao Q Liu and T Zhou ldquoSecuredistributed keyword search in multiple cloudsrdquo in ProcIEEEACM IWQOSrsquo14 Hongkong May 2014 pp 370ndash379

[21] W Sun S Yu W Lou Y T Hou and H Li ldquoProtectingyour right Attribute-based keyword search with fine-grainedowner-enforced search authorization in the cloudrdquo in ProcIEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 226ndash234

[22] Q Zheng S Xu and G Ateniese ldquoVabks Verifiable attribute- based keyword search over outsourced encrypted datardquo inProc IEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 522ndash530

[23] T Jung X Y Li Z Wan and M Wan ldquoPrivacy preservingcloud data access with multi-authoritiesrdquo in Proc IEEE INFO-COMrsquo13 Turin Italy Apr 2013 pp 2625ndash2633

[24] Q Liu C C Tan J Wu and G Wang ldquoEfficient informationretrieval for ranked queries in cost-effective cloud environ-

mentsrdquo in INFOCOM 2012 Proceedings IEEE IEEE 2012 pp2581ndash2585

[25] B Chor E Kushilevitz O Goldreich and M Sudan ldquoPrivateinformation retrievalrdquo Journal of the ACM vol 45 no 6 pp965ndash981 1998

[26] I H Witten A Moffat and T C Bell Managing gigabytesCompressing and indexing documents and images San FranciscoUSA Morgan Kaufmann 1999

[27] W Zhang S Xiao Y Lin T Zhou and S Zhou ldquoSecure ranked

multi-keyword search for multiple data owners in cloud com-putingrdquo in Proc 44th Annual IEEEIFIP International Conferenceon Dependable Systems and Networks (DSN2014) Atlanta USAIEEE jun 2014 pp 276ndash286

[28] IETF ldquoRequest for comments databaserdquo [Online] Availablehttpwwwietforgrfchtml

[29] H Systems ldquoHermetic word frequency counterrdquo [Online]Available httpwwwhermeticchwfcwfchtm

[30] D Boneh and M Franklin ldquoIdentity-based encryption fromthe weil pairingrdquo SIAM Journal on Computing vol 32 no 3pp 586ndash615 2003

[31] W Sun B Wang N Cao M Li W Lou Y T Hou andH Li ldquoPrivacy-preserving multi-keyword text search in thecloud supporting similarity-based rankingrdquo in Proc IEEE

ASIACCSrsquo13 Hangzhou China May 2013 pp 71ndash81[32] J Hur ldquoImproving security and efficiency in attribute-based

data sharingrdquo IEEE TRANSACTIONS ON KNOWLEDGE ANDDATA ENGINEERING vol 25 no 10 pp 2271ndash2282 2013

[33] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preserv-ing encryption for numeric datardquo in Proc ACM SIGMODrsquo04Paris France Jun 2004 pp 563ndash574

[34] A Boldyreva N Chenette Y Lee and A O ldquoOrder-preserving encryption revisited Improved security analysisand alternative solutionsrdquo in Proc Advances in Cryptology(CRYPTOrsquo11) California USA Aug 2011 pp 578ndash595

[35] Y Yi R Li F Chen A X Liu and Y Lin ldquoA digitalwatermarking approach to secure and precise range queryprocessing in sensor networksrdquo in Proc IEEE INFOCOMrsquo13Turin Italy Apr 2013 pp 1950ndash1958

[36] R A Popa F H Li and N Zeldovich ldquoAn ideal-security pro-tocol for order-preserving encodingrdquo in Security and Privacy

(SP) 2013 IEEE Symposium on IEEE 2013 pp 463ndash477[37] F Kerschbaum and A Schroepfer ldquoOptimal average-complexity ideal-security order-preserving encryptionrdquo inProceedings of the 2014 ACM SIGSAC Conference on Computerand Communications Security ACM 2014 pp 275ndash286

Wei Zhang received his BS degree in Com-puter Science from Hunan University Chinain 2011 Since 2011 He has been a PhDcandidate in College of Computer Scienceand Electronic Engineering Hunan Universi-ty His research interests include cloud com-puting network security and data mining

Yaping Lin received his BS degree in Com-puter Application from Hunan University Chi-na in 1982 and his MS degree in Com-puter Application from National University ofDefense Technology China in 1985 He re-ceived his PhD degree in Control Theoryand Application from Hunan University in2000 He has been a professor and PhD

supervisor in Hunan University since 1996From 2004-2005 he worked as a visitingresearcher at the University of Texas at Ar-

lington His research interests include machine learning networksecurity and wireless sensor networks

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 1414

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 14

Sheng Xiao received his BS degree in Ts-inghua University China in 2002 and hisMS degree from the National SingaporeUniversityMIT (Singapore-MIT Alliance) in2003 He received his PhD degree fromthe University of Massachusetts Amherst in2013 He has been an associate professor atthe College of Computer Science and Elec-tronic Engineering Hunan University He re-

ceived INFOCOM Best Paper in 2010 His re-search interests include communication se-

curity cloud computing and big data

Jie Wu is the chair and a Laura H Carnellprofessor with the Department of Computerand Information Sciences Temple Universi-ty Philadelphia PA Prior to joining TempleUniversity he was a program director withthe National Science Foundation and a dis-tinguished professor with Florida Atlantic U-niversity Boca Raton FL He regularly pub-lishes in scholarly journals conference pro-ceedings and books His research interestsinclude mobile computing and wireless net-

works routing protocols cloud and green computing network trustand security and social network applications He serves on severaleditorial boards including IEEE Transactions on Computers IEEETransactions on Service Computing and Journal of Parallel andDistributed Computing He was general cochairchair for IEEE MASS2006 IEEE IPDPS 2008 and IEEE ICDCS 2013 as well as programcochair for IEEE INFOCOM 2011 and CCF CNCC 2013 he servedas a general chair forACM MobiHoc 2014 He was an IEEE Com-puter Society distinguished visitor ACM distinguished speaker andchair for the IEEE Technical Committee on Distributed Processing(TCDP) He is a CCF distinguished speaker and a fellow of the IEEEHe is the recipient of the 2011 China Computer Federation (CCF)Overseas Outstanding Achievement Award

Siwang Zhou received his BS degree inApplied Physics from Fudan University in1995 He received his PhD degree in com-puter application technology from Hunan U-niversity in 2007 He has been an associateprofessor at the College of Computer Sci-ence and Electronic Engineering Hunan U-niversity His research interests include wire-less sensor networks and wavelet compres-

sive sensing

Page 3: Privacy Preserving Ranked Multi-Keyword

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 3140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 3

Data Owners Data Users

Data decryption keys

E n c r y

p t e d

i n d e x e

s

Ranked

results

Semi-trusted

Cloud Server

Administration

Server

D a t a U s e r

A u t h e n t i c a t i o n

D a t a O

w n e r

A u t h e

n t i c a t i o n

Encrypted

Files

Re-encrypted

indexes

S e a r c h r e q u e s t

Trapdoors

Secret

Data

Update

Fig 1 Architecture of privacy preserving keywordsearch in a multi-owner and multi-user cloud model

administration server and data users Data ownershave a collection of files F To enable efficient searchoperations on these files which will be encrypteddata owners first build a secure searchable index I on the keyword set W extracted from F then theysubmit I to the administration server Finally dataowners encrypt their files F and outsource the corre-sponding encrypted files C to the cloud server Uponreceiving I the administration server re-encrypts I for the authenticated data owners and outsourcesthe re-encrypted index to the cloud server Once adata user wants to search t keywords over these

encrypted files stored on the cloud server he firstcomputes the corresponding trapdoors and submitsthem to the administration server Once the datauser is authenticated by the administration serverthe administration server will further re-encrypt thetrapdoors and submit them to the cloud server Uponreceiving the trapdoor T the cloud server searches theencrypted index I of each data owner and returns thecorresponding set of encrypted files To improve thefile retrieval accuracy and save communication costa data user would tell the cloud server a parameterk and cloud server would return the top-k relevantfiles to the data user Once the data user receives thetop-k encrypted files from the cloud server he willdecrypt these returned files Note that how to achievedecryption capabilities are out of the scope of thispaper some excellent work regarding this problemcan be found in [23]

22 Threat Model

In our threat model we assume the administrationserver is trusted The administrative server can beany trusted third party eg the Certificate Authority

in the Public Key Infrastructure the aggregation anddistribution layer in [24] and the third party auditorin [2] Data owners and data users who passed theauthentication of the administration server are also

trusted However the cloud server is not trustedInstead we treat the cloud server as rsquocurious buthonestrsquo which is the same as in previous works [9][10] [11] [13] [19] The cloud server follows our pro-posed protocol but it is eager to obtain the contentsof encrypted files keywords and relevance scoresNote that preserving the access pattern ie the listof returned files is extremely expensive since thealgorithm has to rsquotouchrsquo the whole file set [25] Wedo not aim to protect it in this work for efficiencyconcerns

23 Design Goals and Security Definitions

To enable privacy preserving ranked multi-keywordsearch in the multi-owner and multi-user cloud en-vironment our system design should simultaneouslysatisfy security and performance goals

bull Ranked Multi-keyword Search over Multi-owner The proposed scheme should allowmulti-keyword search over encrypted files whichwould be encrypted with different keys for differ-ent data owners It also needs to allow the cloudserver to rank the search results among differentdata owners and return the top-k results

bull Data owner scalability The proposed schemeshould allow new data owners to enter this sys-tem without affecting other data owners or datausers ie the scheme should support data ownerscalability in a plug-and-play model

bull Data user revocation The proposed schemeshould ensure that only authenticated data userscan perform correct searches Moreover once adata user is revoked he can no longer performcorrect searches over the encrypted cloud data

bull Security Goals The proposed scheme shouldachieve the following security goals 1) KeywordSemantic Security (Definition 1) We will provethat PRMSM achieves semantic security againstthe chosen keyword attack 2) Keyword secrecy(Definition 2) Since the adversary A can knowwhether an encrypted keyword matches a trap-

door we use the weaker security goal (ie secre-cy) that is we should ensure that the probabilityfor the adversary A to infer the actual valueof a keyword is negligibly more than randomlyguessing 3) Relevance score secrecy We shouldensure that the cloud server cannot infer theactual value of the encoded relevance scores

Definition 1 Given a probabilistic polynomial timeadversary A he asks the challenger B for the cipher-text of his submitted keywords for polynomial timesThen A sends two keywords w0 and w1 which are notchallenged before to B B randomly sets micro isin 0 1

and returns an encrypted keyword wmicro to A A contin-ues to ask B for the cipher-text of keyword w the onlyrestriction is that w is not w0 or w1 Finally A outputsits guess microprime for micro We define the advantage that A

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 4140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 4

breaks PRMSM as AdvA=Pr[micro = microprime] minus 1

2

If AdvA isnegligible we say that PRMSM is semantically secureagainst the chosen-keyword attack

Definition 2 Given a probabilistic polynomial timeadversary A he asks the challenger B for the cipher-text of his queried keywords for t times Then B randomly chooses a keyword wlowast encrypts it to wlowastand sends wlowast to A A outputs its guess wprime for wlowastand wins if wprime = wlowast We define the probabilitythat A breaks keyword secrecy as AdvA=Pr[wprime = wlowast]We say that PRMSM achieves keyword secrecy if AdvA= 1

uminust + ε where ϵ is a negligible parameter tdenotes the number of keywords that A has knownand u denotes the size of keyword dictionary

24 Notations

bull O the data owner collection denoted as a set of m data owners O = (O1 O2 Om)

bull F i the plaintext file collection of Oi denoted asa set of n data file F i=(F i1 F i2 F in)

bull C i the ciphertext file collection of F i denoted asC i=(C i1 C i2 C in)

bull W the keyword collection denoted as a set of ukeywords W = (w1 w2 wu)

bull W i Oirsquos encrypted keyword collection of W

denoted as W i = ( wi1 wi2 wiu)

bull W the subset of W which represents queried

keywords denoted as W = (w1 w2 wq)

bull T W the trapdoor for

W denoted as T W =

(T w1 T w2 T wq )bull S ijt the relevance score of tth keyword to jth

file of ith data owner

3 PRELIMINARIES

Before we introduce our detailed construction we first briefly introduce some techniques that will be used inthis paper

31 Bilinear Map

Let G and G1 denote two cyclic groups with a prime

order p We further denote g and g1 as the generatorof G and G1 respectively Let e be a bilinear mape GtimesG rarr G1 then the following three conditions aresatisfied 1) Bilinear foralla b isin Z

lowast p e(ga gb) = e(g g)ab 2)

Non-degenerate e(g g) = 1 3) Computable e can beefficiently computed

32 Bilinear Diffie-Hellman Problem and BilinearDiffie-Hellman Assumption

The Bilinear Diffie-Hellman (BDH) problem in(G G1 e) is described as follows given random g isin

G and ga gb gc for some abc isin Zlowast p computee(g g)abc isin G1

The BDH assumption is presented as follows given(G G1 e) g isin G and ga gb gc for some ab c isin Z

lowast p

an adversary A has advantage ϵ in solving BDHwhen P r[A(ga gb gc) = e(g g)abc] ge ϵ The BDHassumption tells that the advantage ϵ is negligible forany polynomial time A

4 DATA USE R AUTHENTICATION

To prevent attackers from pretending to be legal datausers performing searches and launching statisticalattacks based on the search result data users must be authenticated before the administration server re-encrypts trapdoors for data users Traditional authen-tication methods often follow three steps First datarequester and data authenticator share a secret keysay k0 Second the requester encrypts his personallyidentifiable information d0 using k0 and sends the en-crypted data (d0)k0 to the authenticator Third the au-thenticator decrypts the received data with k0 and au-thenticates the decrypted data However this method

has two main drawbacks First since the secret keyshared between the requester and the authenticatorremains unchanged it is easy to incur replay attackSecond once the secret key is revealed to attackersthe authenticator cannot distinguish between the legalrequester and the attackers the attackers can pretendto be legal requesters without being detected

In this section we first give an overview of thedata user authentication protocol Then we introducehow to achieve secure and efficient data user authen-tication Finally we demonstrate how to detect illegalsearches and how to enable secure and efficient datauser revocation

41 Overview

Now we give an example to illustrate the main idea of the user authentication protocol(the detailed protocolis elaborated in the following subsections) AssumeAlice wants to be authenticated by the administrationserver so she starts a conversation with the serverThe server then authenticates the contents of theconversation If the contents are authenticated bothAlice and the server will generate the initial secret key

according to the conversation contents After the ini-tialization to be authenticated successfully Alice hasto provide the historical data of their conversationsIf the authentication is successful both Alice and theadministration server will change their secret keys ac-cording the contents of the conversation In this waythe secret keys keep changing dynamically withoutknowing the correct historical data an attacker cannotstart a successful conversation with the administrationserver

42 User AuthenticationBefore we introduce the dynamic key generationmethod and the authentication protocol we first intro-duce the format of the authentication data As shown

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 5140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 5

Fig 2 Format of Authentication Data

in Fig 2 the authentication data consists of five

parts The request counter field records the number of search requests that the data user has submitted Thelast request time field asks the data user to providethe historical data of his previous request time Thepersonally identifiable data (eg passport numbertelephone number) field is used to identify a specificdata user while the random number and CRC fieldare further used to check whether the authenticationdata has been tampered with

The key point of a successful authentication is toprovide both the dynamically changing secret keysand the historical data of the corresponding data

user Let kij denotes the secret key shared betweenadministration server and the jth data user U j afteri instances of search requests and dij denotes theauthentication data for the (i+1)th request of U j Ourauthentication protocol runs in the following six steps

A Data user U j prepares his authentication datadij ie U j needs to fill in all the fields of authentica-tion data based on his historical data

B Data user U j encrypts dij with the current secretkey kij and submits the encrypted authenticationdata (dij)kij to the administration server

C After submitting the authentication data thedata user U j generates another secret key ki+1j =kij oplus H (dij) and stores both kij and ki+1j

D Upon receiving U j rsquos encrypted authenticationdata the administration server decrypts it with kij

E The administration server checks the requestcounter last request time personally identifiable dataand CRC respectively If the authentication succeedsthe administration server first generates a new secretkey ki+1j = kij oplus H (dij) then he replies a confirma-tion data di+1j and encrypts it with ki+1j Otherwisethe administration server encrypts di+1j with secret

key kij F Upon receiving a reply from the administrationserver the data user U j will try to decrypt it withki+1j If the decrypted data contains the confirmationdata the authentication is successful Otherwise theauthentication is regarded as being unsuccessful Thedata user deletes the new generated secret key ki+1j

and considers whether to start another authenticationFig 3 shows an example of successful authen-

tication between the administration server and thedata user As we can see after each successful au-thentication process the secret key will be changed

dynamically according to the previous key and somehistorical data Therefore once an attacker steals a se-cret key he can hardly get any benefits On one handif the attacker knows nothing about the historical data

Administration

Server

Data

User U i

0

0i

ik

d

1

1i

ik

d

2

2i

ik

d

33

ii

k d

0ik

1 0 0i i ik k H d

2 1 1i i ik k H d

3 2 2i i ik k H d

4 3 3i i ik k H d

0ik

1 0 0i i ik k H d

2 1 1i i ik k H d

3 2 2i i ik k H d

4 3 3i i ik k H d

Fig 3 Example of data user authentication and dy-namic secret key generation

of the legal data user he cannot even construct a legalauthentication data On the other hand if the legaldata user performs another successful authentication

the previous secret key will be expired

43 Illegal Search Detection

In our scheme the authentication process is protected by the dynamic secret key and the historical infor-mation We assume that an attacker has successfullyeavesdropped the secret key k0j of U j Then he hasto construct the authentication data if the attackerhas not successfully eavesdropped the historical da-ta eg the request counter the last request timehe cannot construct the correct authentication dataTherefore this illegal action will soon be detected bythe administration server Further if the attacker hassuccessfully eavesdropped all data of U j the attackercan correctly construct the authentication data andpretend himself to be U j without being detected bythe administration server However once the legaldata user U j performs his search since the secret keyon the administration server side has changed therewill be contradictory secret keys between the admin-istration server and the legal data user Therefore thedata user and administration server will soon detectthis illegal action

44 Data User Revocation

Different from previous works data user revocationin our scheme does not need to re-encrypt and updatelarge amounts of data stored on the cloud server In-stead the adminstration server only needs to updatethe secret data S a stored on the cloud server As will be detailed in the next section S a = gka1middotka2middotra whereka1 and ka2 are the secret keys of the administrationserver and ra is randomly generated for every updateoperation Consequently the previous trapdoors will be expired Additionally without the help of the

administration server the revoked data user cannotgenerate the correct trapdoor T whprime Therefore a datauser cannot perform correct searches once he is re-voked

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 6140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 6

5 MATCHING DIFFERENT-KE Y ENCRYPTED

KEYWORDS

Numerous data owners are often involved in practicalcloud applications For privacy concerns they would be reluctant to share secret keys with others Insteadthey prefer to use their own secret keys to encrypt

their sensitive data (keywords files) When keywordsof different data owners are encrypted with differentsecret keys the coming question is how to locatedifferent-key encrypted keywords among multiple da-ta owners In this section to enable secure efficientand convenient searches over encrypted cloud dataowned by multiple data owners we systematicallydesign schemes to achieve the following three require-ments First different data owners use different secretkeys to encrypt their keywords Second authenticateddata users can generate their trapdoors without know-ing these secret keys Third upon receiving trapdoors

the cloud server can find the corresponding keyword-s from different data ownersrsquo encrypted keywordswithout knowing the actual value of keywords ortrapdoors

51 Overview

Now we present an example to illustrate the mainidea of our keywords matching protocol(the detailedprotocol is elaborated in the following subsections)Assume Alice wants to use the cloud to store her fileF she first encrypts her file F and gets the ciphertextC To enable other users to perform secure searcheson C Alice extracts a keyword wih and sends theencrypted keyword wih = (E aprime E o) to the adminis-tration server The administration server further re-encrypts E aprime to E a and submits wih = (E a E o) to thecloud server Now Bob wants to search a keyword whprime he first generates the trapdoor T primewhprime and submits it tothe administration server The administration serverre-encrypts T primewhprime to T whprime = (T 1 T 2 T 3) generates asecret data S a and submits T whprime S a to the cloudserver The cloud server will judge whether Bobrsquossearch request matches Alicersquos encrypted keyword by

checking whether e (E a

T 3

) = e (E o

T 1

) middot e (S a

T 2

)holds

52 Construction Initialization

Our construction is based on the aforementioned bi-linear map Let g and g1 denote the generator of twocyclic groups G and G1 with order p Let e be a bilinear map e G times G rarr G1 Given different secretparameters as input a randomized key generationalgorithm will output the private keys used in thesystem ka1 isin Z

+ p ka2 isin Z

+ p kif isin Z

+ p kiw isin Z

+ p larr

(0 1)lowast where ka1 and ka2 are the private keys of the

administration server kiw and kif are the privatekeys used to encrypt keywords and files of data ownerOi respectively Let H (middot) be a public hash function itsoutput locates in Z

+ p

53 Keyword Encryption

For keyword encryption the following conditionsshould be satisfied first different data owners usetheir own secret keys to encrypt keywords Secondfor the same keyword it would be encrypted to dif-ferent cipher-texts each time These properties benefit

our scheme for two reasons First losing the key of one data owner would not lead to the disclosure of other ownersrsquo data Second the cloud server cannotsee any relationship among encrypted keywords Giv-en the hth keyword of data owner Oi ie wih weencrypt wih as follows

wih =983080

gkiwmiddotromiddotH (wih) gkiwmiddotro983081

(1)

where ro is a randomly generated number eachtime which helps enhance the security of wih Foreasy description and understanding we let E primea =

gkiwmiddotromiddotH (wih)

and E o = gkiwmiddotro

The data owner delivers E aprime and E o to the admin-

istration server and the administration server furtherre-encrypts E aprime with his secret keys ka1 and ka2 andgets E a

E a =852008

E aprime middot gka1852009ka2 (2)

Therefore wih = (E a E o) The administrative serv-er further submits wih to the cloud server Note thatsince the administration server only does simple com-putations on the encrypted data he cannot learn anysensitive information from these random encrypted

data without knowing the secret keys of data owners

54 Trapdoor Generation

To make the data users generate trapdoors securelyconveniently and efficiently our proposed schemeshould satisfy two main conditions First the datauser does not need to ask a large amount of dataowners for secret keys to generate trapdoors Sec-ond for the same keyword the trapdoor generatedeach time should be different To meet this conditionthe trapdoor generation is conducted in two steps

First the data user generates trapdoors based on hissearch keyword and a random number Second theadministration server re-encrypts the trapdoors forthe authenticated data user

Assume a data user wants to search keyword whprime so he encrypts it as follows

T primewhprime =983080

gH (whprime )middotru gru983081

(3)

where ru is a randomly generated number eachtime As we can see during the trapdoor generationprocess secret keys of data owners are not required

Additionally with the help of random variable ru forthe same keyword whprime we can generate two differenttrapdoors which prevent attackers from knowing therelationship among trapdoors

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 7140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 7

Upon receiving T primewhprime the administration server firstgenerates a random number ra and then re-encryptsT primewhprime as follows

T whprime =983080

gH (whprime )middotrumiddotka1middotka2middotra grumiddotka1 grumiddotka1middotra983081

(4)

For easy description and understanding we let

T 1 = g

H (whprime )middotrumiddotka1middotka2middotra

T 2 = g

rumiddotka1

T 3 = g

rumiddotka1middotra

hence T whprime = (T 1 T 2 T 3) Finally the administrationserver submits T whprime to the cloud server

55 Keywords Matching among Different DataOwners

The cloud server stores all encrypted files and key-words of different data owners The administrationserver will also store a secret data S a = gka1middotka2middotra

on the cloud server Upon receiving a query requestthe cloud will search over the data of all these dataowners The cloud processes the search request in two

steps First the cloud matches the queried keywordsfrom all keywords stored on it and it gets a candidatefile set Second the cloud ranks files in the candidatefile set and finds the most top-k relevant files Weintroduce the matching strategy here while leavingthe task of introducing the ranking strategy in the nextsection When the cloud obtains the trapdoor T whprime andencrypted keywords (E o E a) he first computes

e (S a T 2)

= e852008

gramiddotka1middotka2 grumiddotka1852009

(5)

= e(g g)ramiddotka1middotka2middotrumiddotka1

Then he can judge whether whprime = wih (ie anencrypted keyword is located) holds if the followingequation is true

e (E a T 3)

= e

1048616983080gkiwmiddotromiddotH (wih) middot gka1

983081ka2 grumiddotka1middotra

1048617= e(g g)

(kiw middotromiddotH (wih)+ka1)middotka2middotrumiddotka1middotra (6)

= e(g g)kiwmiddotromiddotH (wih)middotka2middotrumiddotka1middotra middot e (S a T 2)

= e983080

gkiwmiddotro gH (wih)middotka2middotrumiddotka1middotra983081

middot e (S a T 2)

= e (E o T 1) middot e (S a T 2)

6 PRIVACY PRESERVING RANKED SEARCH

The aforementioned section helps the cloud matchthe queried keywords and obtain a candidate fileset However we cannot simply return undifferentialfiles to data users for the following two reasons Firstreturning all candidate files would cause abundantcommunication overhead for the whole system Sec-ond data users would only concern the top-k relevantfiles corresponding to their queries In this sectionwe first elucidate an order and privacy preserving

encoding scheme Then we illustrate an additive orderpreserving and privacy preserving encoding schemeFinally we apply the proposed scheme to encode therelevance scores and obtain the top-k search results

x 1 2 3 4 5

f ( x) 100-1000 1100-1800 2000-4200 4300-5000 5100-7000

Fig 4 An example of Order Preserving and PrivacyPreserving Function

61 Order and Privacy Preserving Function

To rank the relevance score while preserving its priva-cy the proposed function should satisfy the followingconditions 1) This function should preserve the orderof data as this helps the cloud server determine whichfile is more relevant to a certain keyword according tothe encoded relevance scores 2) This function shouldnot be revealed by the cloud server so that cloudserver can make comparisons on encoded relevancescores without knowing their actual values 3) Dif-ferent data owners should have different functionssuch that revealing the encoded value of a data ownerwould not lead to the leakage of encoded values of other data owners In order to satisfy condition 1we introduce a data processing part m(x middot) whichpreserves the order of x To satisfy condition 2 weintroduce a disturbing part rf which helps prevent thecloud server from revealing this function To satisfycondition 3 we use m(x middot) to process the ID of dataowners So this function belongs to the followingfunction family

F yoppf (x) =991761

0lejkleτ

Ajk middot m(x j) middot m(y k) + rf (7)

where τ denotes the degree of F yoppf (x) and Ajkdenotes the coefficients of m(x j) middot m(y k) We furtherdefine m(x j) as follows m(x 0) = 1 m(x 1) = xand m(x j) = (m(x jminus1)+α middot x) middot (1+λ) if jgt1 whereα and λ are two constant numbers

Now we introduce how to set the disturbing partrf Since

sum0lejkleτ Ajkmiddot(m(x+1 j)minusm(x j))middotm(y k)gesum

0lejkleτ Ajkmiddot((1+λ)jminus1

+ αmiddotsum

1leilejminus2(1+λ))

Let l be an integer such that2lminus1le

sum0lejkleτ Ajkmiddot((1+λ)

jminus1+αmiddot

sum1leilejminus2(1+λ))le2l

we can set rf isin (0 2lminus1)Obviously forallx1gtx2 we have F yoppf (x1)gtF yoppf (x2)

62 Additive Order and Privacy Preserving Func-tion

With the order and privacy preserving function wehave if x1 gt x2 then y1 gt y2 However the additionof the function is not necessarily order preserving

An example is presented in Fig 4 Obviously f (x)

is order preserving to preserve privacy a disturbingpart is introduced As we can see f (1) + f (5) variesfrom 5200 to 8000 f (1) + f (4) varies from 4400 to6000 f (2) + f (3) varies from 3100 to 6000 Though

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 8140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 8

1 + 5 gt 1 + 4 and 1 + 5 gt 2 + 3 it is probable forf (1)+f (5) lt f (1)+f (4) and f (1)+f (5) lt f (2)+f (3)

To correctly perform the secure ranked multi-keyword search the sum of any two encoded rel-evance scores should still be ordered and privacypreserved (we use the sum of encoded relevance scoreto evaluate the relevance between a file and multiplekeywords in this paper) To satisfy this conditionwe further design an additive order and privacypreserving function family based on Eq 7

F yaoppf (x) =991761

0lejkleτ

Ajk middot m(x j) middot m(y k) + raof (8)

Where τ denotes the degree of F yaoppf (x) and Ajkdenotes the coefficients of m(x j) middot m(y k)

Before we take a step further towards the useof F yaoppf (x) we will first introduce our auxiliarytheorem ie how to make an order and privacypreserving function to be additive order and privacypreserving

Definition 3 Given an order preserving functiony = f (x) we define ∆f (xi) = f (xi+1) minus f (xi) and∆f (xi) = ∆f (xi+1) minus ∆f (xi) To preserve privacy wedefine yi = f (xi) + ri where ri is a random numberand ri gt 0 We define the max value of ri as rimaxand rimax = f (xi)minusimiddot∆(xi) hence yi can change fromf (xi) to f (xi) + rimax

Theorem 1 Given an order preserving function yi =f (xi) + ri when the following three conditionsare satisfied (1) foralli ∆f (xi + 1) le ∆f (xi) (2) foralli

˜∆f (xi + 1) ge ˜

∆f (xi) (3) foralli ri lt 983080i

2

middot ˜∆f (xi)983081 2The order and privacy preserving function yi is also

an additive order and privacy preserving that isfor any

sumxiisinD1

xi le sum

xjisinD2xj where D1 and D2

denotes two subsets of the definition domain we havesumxiisinD1

yi lesum

xjisinD2yj Proof The proof is elaborated in Appendix A

63 Encoding relevance scores

In our paper we use a well-known method to com-pute the relevance score [26] ie Score(w F d) =1

|F d|(1+ln f dw)ln(1+ N

f w) where w denotes the given

keyword |F d| is the length of file F d f dw denotes theT F of w in file F d f w denotes the number of filescontaining w and N denotes the total number of filesin the collection

Now we introduce how to encode the relevancescore with an additive order and privacy preservingfunction in our additive order and privacy preservingfunction family F yaoppf (x) Given input S ijt (the rel-evance score of tth keyword to jth file of ith dataowner) Data owners can independently choose afunction from this family to protect the privacy of their relevance scores For simplicity in this paper we

specify data owner i to choose F H i(i)aoppf (middot) and defineV ijt as the encoded data of S ijt

V ijt = F H i(i)aoppf (S ijt) (9)

where H i(middot) is a public known hash function and i isthe I D of data owner Oi

With the well-designed properties of F aoppf thecloud server can make a comparison among encodedrelevance scores for the same data owner Howeversince different data owners encode their relevancescores with different functions in F aoppf the cloudserver cannot make a comparison between encodedrelevance scores for different data owners To solvethis problem we define

T ijt(y) = F yaoppf (S ijt) (10)

where T ijt(y) is used to help the cloud servermake comparisons among relevance scores encoded by different data owners and y is a variable whichtakes the hash value of data ownerrsquos ID as input

Finally we attach each V ijt with a T ijt(y)

64 Ranking search results

In this paper we use the sum of the relevance scoresas the metric to rank search results Now we intro-duce the strategies of ranking search results based onthe encoded relevance scores First the cloud com-putes V ij =

sumtisinW

V ijt the sum of encoded relevance

scores between the jth file and matched keywords forOi and the auxiliary value T ij(y) =

sumtisinW

T ijt(y) Then

the cloud ranks the sum of encoded relevance scorewith the following two conditions

(1) Two encoded data belong to the same dataowner Given that a data user issues a query W =wm wn we assume that Oirsquos F 1 and F 2 satisfy thequery Then the cloud adds the encoded relevancescore together and gets the relevance score of Oiprimes F 1to W

V i1 = V i1m + V i1n (11)

Similarly

V i2 = V i2m + V i2n (12)

If V i1 ge V i2 then F 1 is more relevant to the queried

keyword sets W otherwise F 2 is more relevant to W (2) Two encoded data belong to two different data

owners Given that a data user issues a query W =wm wn we assume Oirsquos F 1 and Oj rsquos F 2 satisfy thequery The cloud server makes comparison on theirencoded relevance scores in the following four steps

First the cloud server computes the relevance score

of Oiprimes F 1 to W

V i1 = V i1m + V i1n (13)

Second the cloud server computes the T j2(y)

T j2(y) = T j2m(y) + T j2n(y) (14)

Third the cloud server substitutes H i(i) for thevariable y and gets T j2(H (i))

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 9140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 9

C 11

C 12

C 21

C 23

O1 O2

H (1)

Fig 5 Example of ranking search results

Finally the cloud draws the conclusion if V i1 ge

T j2(H i(i)) then Oiprimes F 1 is more relevant to W thanOj primes F 2 otherwise Oj primes F 2 is more relevant to W

Now the cloud server can make comparisons onencoded relevance scores Thus it is easy for the cloudto return the top-k relevant files to the data user

Fig 5 shows an example of making comparisons onthe encoded relevance scores As we can see giventhe trapdoor T w1 the cloud server finds that w11 andw21 match T w1 Data owner O1 has two encryptedfiles C 11 and C 12 that contain w11 Data ownerO2 also has two encrypted files C 21 and C 23 thatcontain w21 For O1 he compares V 111 with V 121 if V 111 gt V 121 the cloud server regards C 11 as beingmore relevant to T w1 otherwise C 21 is more relevantto T w1 We assume C 12 is more relevant to T w1 for O1

and C 21 is more relevant for O2 then the cloud makescomparisons between C 12 and C 21 As shown in thefigure the cloud first substitutes H (1) to T 211(y) andgets T 211(H (1)) then he compares T 211(H (1)) withV 121 if V 121 gt T 211(H (1)) Then O1rsquos encrypted fileC 12 is more relevant to T w1 otherwise O2rsquos encryptedfile C 21 is more relevant to the search request

7 SECURITY ANALYSIS

In this section we provide step-by-step security anal-yses to demonstrate that the security requirementshave been satisfied for the data files the keywordsthe queries and the relevance scores

71 Data Files

The data files are protected by symmetric encryption before upload As long as the encryption algorithm isnot breakable the cloud server cannot know the data

72 Keywords

We formulate the security goals achieved by PRMSMwith the following two theorems

Theorem 2 Given the DDH (Decisional DiffieHell-man) assumption PRMSM is semantically secure a-gainst the chosen keyword attack under the selectivesecurity modelProof See Appendix B

Theorem 3 Given the DL (Discrete Logarithm) as-sumption PRMSM achieves keyword secrecy in therandom oracle modelProof See Appendix C

73 Trapdoors

Recall the trapdoor construction formula

T whprime =983080

gH (whprime )middotrumiddotka1middotka2middotra grumiddotka1 grumiddotka1middotra983081

(15)

If the cloud server wants to know the actual valueof the trapdoor and distinguish two trapdoors it has

to solve the discrete logarithm problem in Z p withlarge prime p therefore the privacy of trapdoor isprotected as long as the discrete logarithm problem ishard

74 Relevance Scores

In our scheme relevance scores are encoded with twoAdditive Order and Privacy Preserving FunctionsNow we analyze the security of additive order andprivacy preserving functions Assume the input for

the F yaopp

(x) is

s and the data owner ID is

i Thenthe cloud server can only capture the following value

that is derived from s

F (H i(i))aopp (s) =991761

0lejkleτ

Ajk middotm(s j)middotm(H i(i) k)+raof

(16)Assume the cloud server has collected n encod-

ed relevance scores for the same relevance score s(F

(H i(i))aopp (s)) then he can construct n equations How-

ever these functions have n + 1 unknown variablesTherefore it is infeasible for the cloud server to breakthe additive order and privacy preserving F yaopp(x)

Thus the security of the corresponding F yaopp(x) en-coded relevance score is also preserved

8 PERFORMANCE EVALUATION

In this section we measure the efficiency of PRMSMand compare it with its previous version SecureRanked Multi-keyword Search for Multiple data own-ers in cloud computing (SRMSM) [27] and the state-of-the-art privacy-preserving Multi-keyword RankedSearch over Encrypted cloud data (MRSE) [11] side

by side Since MRSE is only suitable for the singleowner model our PRMSM and SRMSM not onlywork well in multi-owner settings but also outper-form MRSE on many aspects

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 10140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 10

Number of files in the dataset (times103)

T i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(a) For different size datasetswith the same dictionary u =4000

Number of keywords in dictionary(times103) T

i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(b) For the same dataset withdifferent dictionary sizes n =1000

T i m e o f E n c o d i n g y ( s )

Number of keywords in dictionary (times103)

(c) Time cost of encoding rel-evance score (a subroutine of

building index)

Fig 6 Time cost of index construction

Number of keywords in dictionary(times103) T i m e o f g e n e r a t i n g t r a p d o o r ( s )

(a) For different size of key-word dictionary with the samenumber of queried keywordsq = 100

Number of queried keywords (times102) T i m e o f g e

n e r a t i n g t r a p d o o r ( times 1 0 - 2 s )

(b) For different number of queried keywords with thesame size of keyword dictio-nary u = 4000

Fig 7 Time cost of generating trapdoors

81 Evaluation Settings

We conduct performance experiments on a real data

set the Internet Request For Comments dataset (RFC)[28] We use Hermetic Word Frequency Counter [29]to extract keywords from each RFC file After thekeyword extraction we compute keyword statisticssuch as the keyword frequency in each file the lengthof each file the number of files containing a specifickeyword etc We further calculate the relevance scoreof a keyword to a file based on these statistics Thefile size and keyword frequency of this data set can be seen in [20]

The experiment programs are coded using thePython programming language on a PC with 22GHZ

Intel Core CPU and 2GB memory We implementall necessary routines for data owners to preprocessdata files for the data user to generate trapdoorsfor the administrative server to re-encrypt keywordstrapdoors and for the cloud server to perform rankedsearches We use the Weil pairing [30] to construct our bilinear map

82 Evaluation Results

821 Index Construction

Fig 6(a) shows that given the same keyword dictio-

nary (u=4000) time of index construction for theseschemes increases linearly with an increasing numberof files while SRMSM and PRMSM spend much lesstime on index construction Fig 6(b) demonstrates

that given the same number of files (n=1000) SRMSMand PRMSM consume much less time than MRSE

on constructing indexes Additionally SRMSM andPRMSM are insensitive to the size of the keyworddictionary for index construction while MRSE suf-fers a quadratic growth with the size of keyworddictionary increases Fig 6(c) shows the encodingefficiency of our proposed AOPPF The time spent onencoding increases from 01s to 1s when the numberof keywords increases from 1000 to 10000 This timecost can be acceptable

822 Trapdoor Generation

Compared with index construction trapdoor genera-

tion consumes relatively less time Fig 7(a) demon-strates that given the same number of queried key-words (q=100) SRMSM and PRMSM are insensitive tothe size of keyword dictionary on trapdoor generationand consumes 0026s and 0031s respectively Mean-while MRSE increases from 004s to 62s Fig 7(b)shows that given the same number of dictionary size(u=4000) when the number of queried keywords in-creases from 100 to 1000 the trapdoor generation timefor MRSE is 031s and remains unchanged WhileSRMSM increases from 0024s to 025s PRMSM in-creases from 0031s to 031s We observe that PRMSM

spends a little more time than SRMSM on trapdoorgeneration the reason is that PRMSM introduces anadditional variable to ensure the randomness of trap-doors

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 11140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 11

T

i m e c o s t o f r e - e n c r y p t i o n ( s )

Average number of keywords per owner

(a) Time cost of re-encryptingkeywords

Average number of trapdoors per user

T i m e c o s t o f r e - e n c r y p t i o n ( s )

(b) Time cost of re-encryptingtrapdoors

Fig 8 Time cost of the administration server

Number of queried keywords

T i m e o f q u e r y ( s )

(a) For different size of queriedkeywords with the same sizeof data set n=2000 and key-word dictionary u = 100

Number of files in the dataset (times103)

T i m e o f q u e r y ( s )

(b) For different size of data setwith the same size of queriedkeywords q=5 and keyworddictionary u = 100

Number of keywords in dictionary (times102

T i m e o f q u e r y ( s )

(c) For different size of key-word dictionary with the samesize of queried keywords q=5and data set n = 2000

Fig 9 Time cost of search

823 Re-encryption by the administration server

Fig 8(a) illustrates the re-encryption time cost of the

administration server in PRMSM As we can see forthe same average number of keywords per owner themore data owners are involved the more time is spenton re-encryption When there are 300 data ownerseach data owner has 100 keywords we need 38s tore-encrypt these keywords which is acceptable

Fig 8(b) demonstrates the time cost of re-encryptingtrapdoors We observe that for the same averagenumber of trapdoors per user the more data usersthat submit trapdoors the more time would be spenton re-encryption When there are 1000 data userswho concurrently submit data each data user has 10

trapdoors we only need 334s for re-encryption

824 Search

From Fig 9 we observe that PRMSM spends moretime for searching The fundamental reason is thatthe pairing operation used in PRMSM needs moretime As we can see from Fig 9(a) and Fig 9(c)the more keywords existing in the cloud server themore time is required for pairing operation Fig 9(b)confirms that when the number of keywords stored onthe cloud server remains a constant PRMSM will not

increase even if the number of files increases ThoughPRMSM spends relatively more time this observationalso confirms that the searching operation should beoutsourced to the cloud server

9 RELATED WOR K

In this section we review three categories of worksearchable encryption secure keyword search in cloudcomputing and order preserving encryption

91 Searchable Encryption

The earliest attempt of searchable encryption wasmade by Song et al In [3] they propose to encrypteach word in a file independently and allow the serverto find whether a single queried keyword is containedin the file without knowing the exact word Thisproposal is more of theoretic interests because of highcomputational costs Goh et al propose building akeyword index for each file and using Bloom filter

to accelerate the search [4] Curtmola et al propose building indices for each keyword and use hashtables as an alternative approach to searchable en-cryption [5] The first public key scheme for keywordsearch over encrypted data is presented in [6] [7] and[8] further enrich the search functionalities of search-able encryption by proposing schemes for conjunctivekeyword search

The searchable encryption cares mostly about singlekeyword search or boolean keyword search Extend-ing these techniques for ranked multi-keyword searchwill incur heavy computation and storage costs

92 Secure Keyword Search in Cloud Computing

The privacy concerns in cloud computing motivatethe study on secure keyword search Wang et al

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 12140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 12

first defined and solved the secure ranked keywordsearch over encrypted cloud data In [9] and [18] theyproposed a scheme that returns the top-k relevant filesupon a single keyword search Cao et al [10] [11]and Sun et al [31] [12] extended the secure keywordsearch for multi-keyword queries Their approachesvectorize the list of keywords and apply matrix mul-tiplications to hide the actual keyword informationfrom the cloud server while still allowing the serverto find out the top-k relevant data files Xu et alproposed MKQE (Multi-Keyword ranked Query onEncrypted data) that enables a dynamic keyword dic-tionary and avoids the ranking order being distorted by several high frequency keywords [13] Li et al[14] Chuah et al [15] Xu et al [16] and Wang et al[17] proposed fuzzy keyword search over encryptedcloud data aiming at tolerance of both minor typosand format inconsistencies for usersrsquo search input [19]further proposed privacy-assured similarity searchmechanisms over outsourced cloud data In [20] weproposed a secure efficient and distributed keywordsearch protocol in the geo-distributed cloud environ-ment

The system model of these previous works onlyconsider one data owner which implies that in theirsolutions the data owner and data users can easilycommunicate and exchange secret information Whennumerous data owners are involved in the systemsecret information exchanging will cause considerablecommunication overhead Sun et al [21] and Zheng

et al [22] proposed secure attribute-based keywordsearch schemes in the challenging scenario wheremultiple owners are involved However applying CP-ABE in the cloud system would introduce problemsfor data user revocation ie the cloud has to updatethe large amount of data stored on it for a data userrevocation [32] Additionally they do not supportprivacy preserving ranked multi-keyword search Ourpaper differs from previous studies regarding the em-phasis of multiple data owners in the system modelThis paper seeks a solution scheme to maximally relaxthe requirements for data owners and users so that

the scheme could be suitable for a large number of cloud computing users

93 Order Preserving Encryption

The order preserving encryption is used to preventthe cloud server from knowing the exact relevancescores of keywords to a data file The early work of Agrawal et al proposed an Order Preserving sym-metric Encryption (OPE) scheme where the numericalorder of plain texts are preserved [33] Boldyreva etal further introduced a modular order preserving

encryption in [34] Yi et al [35] proposed an order p-reserving function to encode data in sensor networksPopa et al [36] recently proposed an ideal-secureorder-preserving encryption scheme Kerschbaum et

al [37] further proposed a scheme which is not onlyidea-secure but is also an efficient order-preservingencryption scheme However these schemes are notadditive order preserving As a complementary workto the previous order preserving work we propose anew additive order and privacy preserving functions(AOPPF) Data owners can freely choose any functionfrom an AOPPF family to encode their relevancescores The cloud server computes the sum of encodedrelevance scores and ranks them based on the sum

10 CONCLUSIONS

In this paper we explore the problem of securemulti-keyword search for multiple data owners andmultiple data users in the cloud computing envi-ronment Different from prior works our schemesenable authenticated data users to achieve secureconvenient and efficient searches over multiple dataownersrsquo data To efficiently authenticate data usersand detect attackers who steal the secret key andperform illegal searches we propose a novel dynamicsecret key generation protocol and a new data userauthentication protocol To enable the cloud server toperform secure search among multiple ownersrsquo dataencrypted with different secret keys we systematical-ly construct a novel secure search protocol To rank thesearch results and preserve the privacy of relevancescores between keywords and files we propose anovel Additive Order and Privacy Preserving Func-tion family Moreover we show that our approachis computationally efficient even for large data andkeyword sets As our future work on one hand wewill consider the problem of secure fuzzy keywordsearch in a multi-owner paradigm On the other handwe plan to implement our scheme on the commercialclouds

ACKNOWLEDGMENTS

This work is supported in part by the National Natu-ral Science Foundation of China (Project No 6117303861472125 61300217) and the China Scholarship Coun-cil A preliminary version of this paper appearedin the Proceedings of the IEEEIFIP InternationalConference on Dependable Systems and Networks(DSNrsquo14) [27]

REFERENCES

[1] M Armbrust A Fox R Griffith A D Joseph R KatzA Konwinski G Lee D Patterson A Rabkin I Stoica andM Zaharia ldquoA view of cloud computingrdquo Communication of the ACM vol 53 no 4 pp 50ndash58 2010

[2] C Wang S S Chow Q Wang K Ren and W Lou ldquoPrivacy-preserving public auditing for secure cloud storagerdquo Comput-

ers IEEE Transactions on vol 62 no 2 pp 362ndash375 2013[3] DSong DWagner and APerrig ldquoPractical techniques for

searches on encrypted datardquo in Proc IEEE International Sym- posium on Security and Privacy (SampPrsquo00) Nagoya Japan Jan2000 pp 44ndash55

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 13140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 13

[4] E Goh (2003) Secure indexes [Online] Availablehttpeprintiacrorg

[5] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proc ACM CCSrsquo06 VA USA Oct 2006 pp79ndash88

[6] D B et al ldquoPublic key encryption with keyword search secureagainst keyword guessing attacks without random oraclerdquoEUROCRYPT vol 43 pp 506ndash522 2004

[7] P Golle J Staddon and B Waters ldquoSecure conjunctive key-word search over encrypted datardquo in Proc Applied Cryptogra-

phy and Network Security (ACNSrsquo04) Yellow Mountain China Jun 2004 pp 31ndash45

[8] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo in ProcInformation and Communications Security (ICICSrsquo05) BeijingChina Dec 2005 pp 414ndash426

[9] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo in Proc IEEEDistributed Computing Systems (ICDCSrsquo10) Genoa Italy Jun2010 pp 253ndash262

[10] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo in Proc IEEE INFOCOMrsquo11 Shanghai China Apr 2011pp 829ndash837

[11] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo Parallel and Distributed Systems IEEE Transactions onvol 25 no 1 pp 222ndash233 2014

[12] W Sun B Wang N Cao M Li W Lou Y T Hou and H LildquoVerifiable privacy-preserving multi-keyword text search inthe cloud supporting similarity-based rankingrdquo Parallel andDistributed Systems IEEE Transactions on vol 25 no 11 pp3025ndash3035 2014

[13] Z Xu W Kang R Li K Yow and C Xu ldquoEfficient multi-keyword ranked query on encrypted data in the cloudrdquo inProc IEEE Parallel and Distributed Systems (ICPADSrsquo12) Singa-pore Dec 2012 pp 244ndash251

[14] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo in

Proc IEEE INFOCOMrsquo10 San Diego CA Mar 2010 pp 1ndash5[15] M Chuah and W Hu ldquoPrivacy-aware bedtree based solutionfor fuzzy multi-keyword search over encrypted datardquo in ProcIEEE 31th International Conference on Distributed ComputingSystems (ICDCSrsquo11) Minneapolis MN Jun 2011 pp 383ndash392

[16] P Xu H Jin Q Wu and W Wang ldquoPublic-key encryptionwith fuzzy keyword search A provably secure scheme underkeyword guessing attackrdquo Computers IEEE Transactions onvol 62 no 11 pp 2266ndash2277 2013

[17] B Wang S Yu W Lou and Y T Hou ldquoPrivacy-preservingmulti-keyword fuzzy search over encrypted data in thecloudrdquo in IEEE INFOCOM Toronto Canada May 2014 pp2112ndash2120

[18] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoParallel and Distributed Systems IEEE Transactions on vol 23

no 8 pp 1467ndash1479 2012[19] C Wang K Ren S Yu and K M R Urs ldquoAchieving usableand privacy-assured similarity search over outsourced clouddatardquo in Proc IEEE INFOCOMrsquo12 Orlando FL Mar 2012 pp451ndash459

[20] W Zhang Y Lin S Xiao Q Liu and T Zhou ldquoSecuredistributed keyword search in multiple cloudsrdquo in ProcIEEEACM IWQOSrsquo14 Hongkong May 2014 pp 370ndash379

[21] W Sun S Yu W Lou Y T Hou and H Li ldquoProtectingyour right Attribute-based keyword search with fine-grainedowner-enforced search authorization in the cloudrdquo in ProcIEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 226ndash234

[22] Q Zheng S Xu and G Ateniese ldquoVabks Verifiable attribute- based keyword search over outsourced encrypted datardquo inProc IEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 522ndash530

[23] T Jung X Y Li Z Wan and M Wan ldquoPrivacy preservingcloud data access with multi-authoritiesrdquo in Proc IEEE INFO-COMrsquo13 Turin Italy Apr 2013 pp 2625ndash2633

[24] Q Liu C C Tan J Wu and G Wang ldquoEfficient informationretrieval for ranked queries in cost-effective cloud environ-

mentsrdquo in INFOCOM 2012 Proceedings IEEE IEEE 2012 pp2581ndash2585

[25] B Chor E Kushilevitz O Goldreich and M Sudan ldquoPrivateinformation retrievalrdquo Journal of the ACM vol 45 no 6 pp965ndash981 1998

[26] I H Witten A Moffat and T C Bell Managing gigabytesCompressing and indexing documents and images San FranciscoUSA Morgan Kaufmann 1999

[27] W Zhang S Xiao Y Lin T Zhou and S Zhou ldquoSecure ranked

multi-keyword search for multiple data owners in cloud com-putingrdquo in Proc 44th Annual IEEEIFIP International Conferenceon Dependable Systems and Networks (DSN2014) Atlanta USAIEEE jun 2014 pp 276ndash286

[28] IETF ldquoRequest for comments databaserdquo [Online] Availablehttpwwwietforgrfchtml

[29] H Systems ldquoHermetic word frequency counterrdquo [Online]Available httpwwwhermeticchwfcwfchtm

[30] D Boneh and M Franklin ldquoIdentity-based encryption fromthe weil pairingrdquo SIAM Journal on Computing vol 32 no 3pp 586ndash615 2003

[31] W Sun B Wang N Cao M Li W Lou Y T Hou andH Li ldquoPrivacy-preserving multi-keyword text search in thecloud supporting similarity-based rankingrdquo in Proc IEEE

ASIACCSrsquo13 Hangzhou China May 2013 pp 71ndash81[32] J Hur ldquoImproving security and efficiency in attribute-based

data sharingrdquo IEEE TRANSACTIONS ON KNOWLEDGE ANDDATA ENGINEERING vol 25 no 10 pp 2271ndash2282 2013

[33] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preserv-ing encryption for numeric datardquo in Proc ACM SIGMODrsquo04Paris France Jun 2004 pp 563ndash574

[34] A Boldyreva N Chenette Y Lee and A O ldquoOrder-preserving encryption revisited Improved security analysisand alternative solutionsrdquo in Proc Advances in Cryptology(CRYPTOrsquo11) California USA Aug 2011 pp 578ndash595

[35] Y Yi R Li F Chen A X Liu and Y Lin ldquoA digitalwatermarking approach to secure and precise range queryprocessing in sensor networksrdquo in Proc IEEE INFOCOMrsquo13Turin Italy Apr 2013 pp 1950ndash1958

[36] R A Popa F H Li and N Zeldovich ldquoAn ideal-security pro-tocol for order-preserving encodingrdquo in Security and Privacy

(SP) 2013 IEEE Symposium on IEEE 2013 pp 463ndash477[37] F Kerschbaum and A Schroepfer ldquoOptimal average-complexity ideal-security order-preserving encryptionrdquo inProceedings of the 2014 ACM SIGSAC Conference on Computerand Communications Security ACM 2014 pp 275ndash286

Wei Zhang received his BS degree in Com-puter Science from Hunan University Chinain 2011 Since 2011 He has been a PhDcandidate in College of Computer Scienceand Electronic Engineering Hunan Universi-ty His research interests include cloud com-puting network security and data mining

Yaping Lin received his BS degree in Com-puter Application from Hunan University Chi-na in 1982 and his MS degree in Com-puter Application from National University ofDefense Technology China in 1985 He re-ceived his PhD degree in Control Theoryand Application from Hunan University in2000 He has been a professor and PhD

supervisor in Hunan University since 1996From 2004-2005 he worked as a visitingresearcher at the University of Texas at Ar-

lington His research interests include machine learning networksecurity and wireless sensor networks

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 1414

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 14

Sheng Xiao received his BS degree in Ts-inghua University China in 2002 and hisMS degree from the National SingaporeUniversityMIT (Singapore-MIT Alliance) in2003 He received his PhD degree fromthe University of Massachusetts Amherst in2013 He has been an associate professor atthe College of Computer Science and Elec-tronic Engineering Hunan University He re-

ceived INFOCOM Best Paper in 2010 His re-search interests include communication se-

curity cloud computing and big data

Jie Wu is the chair and a Laura H Carnellprofessor with the Department of Computerand Information Sciences Temple Universi-ty Philadelphia PA Prior to joining TempleUniversity he was a program director withthe National Science Foundation and a dis-tinguished professor with Florida Atlantic U-niversity Boca Raton FL He regularly pub-lishes in scholarly journals conference pro-ceedings and books His research interestsinclude mobile computing and wireless net-

works routing protocols cloud and green computing network trustand security and social network applications He serves on severaleditorial boards including IEEE Transactions on Computers IEEETransactions on Service Computing and Journal of Parallel andDistributed Computing He was general cochairchair for IEEE MASS2006 IEEE IPDPS 2008 and IEEE ICDCS 2013 as well as programcochair for IEEE INFOCOM 2011 and CCF CNCC 2013 he servedas a general chair forACM MobiHoc 2014 He was an IEEE Com-puter Society distinguished visitor ACM distinguished speaker andchair for the IEEE Technical Committee on Distributed Processing(TCDP) He is a CCF distinguished speaker and a fellow of the IEEEHe is the recipient of the 2011 China Computer Federation (CCF)Overseas Outstanding Achievement Award

Siwang Zhou received his BS degree inApplied Physics from Fudan University in1995 He received his PhD degree in com-puter application technology from Hunan U-niversity in 2007 He has been an associateprofessor at the College of Computer Sci-ence and Electronic Engineering Hunan U-niversity His research interests include wire-less sensor networks and wavelet compres-

sive sensing

Page 4: Privacy Preserving Ranked Multi-Keyword

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 4140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 4

breaks PRMSM as AdvA=Pr[micro = microprime] minus 1

2

If AdvA isnegligible we say that PRMSM is semantically secureagainst the chosen-keyword attack

Definition 2 Given a probabilistic polynomial timeadversary A he asks the challenger B for the cipher-text of his queried keywords for t times Then B randomly chooses a keyword wlowast encrypts it to wlowastand sends wlowast to A A outputs its guess wprime for wlowastand wins if wprime = wlowast We define the probabilitythat A breaks keyword secrecy as AdvA=Pr[wprime = wlowast]We say that PRMSM achieves keyword secrecy if AdvA= 1

uminust + ε where ϵ is a negligible parameter tdenotes the number of keywords that A has knownand u denotes the size of keyword dictionary

24 Notations

bull O the data owner collection denoted as a set of m data owners O = (O1 O2 Om)

bull F i the plaintext file collection of Oi denoted asa set of n data file F i=(F i1 F i2 F in)

bull C i the ciphertext file collection of F i denoted asC i=(C i1 C i2 C in)

bull W the keyword collection denoted as a set of ukeywords W = (w1 w2 wu)

bull W i Oirsquos encrypted keyword collection of W

denoted as W i = ( wi1 wi2 wiu)

bull W the subset of W which represents queried

keywords denoted as W = (w1 w2 wq)

bull T W the trapdoor for

W denoted as T W =

(T w1 T w2 T wq )bull S ijt the relevance score of tth keyword to jth

file of ith data owner

3 PRELIMINARIES

Before we introduce our detailed construction we first briefly introduce some techniques that will be used inthis paper

31 Bilinear Map

Let G and G1 denote two cyclic groups with a prime

order p We further denote g and g1 as the generatorof G and G1 respectively Let e be a bilinear mape GtimesG rarr G1 then the following three conditions aresatisfied 1) Bilinear foralla b isin Z

lowast p e(ga gb) = e(g g)ab 2)

Non-degenerate e(g g) = 1 3) Computable e can beefficiently computed

32 Bilinear Diffie-Hellman Problem and BilinearDiffie-Hellman Assumption

The Bilinear Diffie-Hellman (BDH) problem in(G G1 e) is described as follows given random g isin

G and ga gb gc for some abc isin Zlowast p computee(g g)abc isin G1

The BDH assumption is presented as follows given(G G1 e) g isin G and ga gb gc for some ab c isin Z

lowast p

an adversary A has advantage ϵ in solving BDHwhen P r[A(ga gb gc) = e(g g)abc] ge ϵ The BDHassumption tells that the advantage ϵ is negligible forany polynomial time A

4 DATA USE R AUTHENTICATION

To prevent attackers from pretending to be legal datausers performing searches and launching statisticalattacks based on the search result data users must be authenticated before the administration server re-encrypts trapdoors for data users Traditional authen-tication methods often follow three steps First datarequester and data authenticator share a secret keysay k0 Second the requester encrypts his personallyidentifiable information d0 using k0 and sends the en-crypted data (d0)k0 to the authenticator Third the au-thenticator decrypts the received data with k0 and au-thenticates the decrypted data However this method

has two main drawbacks First since the secret keyshared between the requester and the authenticatorremains unchanged it is easy to incur replay attackSecond once the secret key is revealed to attackersthe authenticator cannot distinguish between the legalrequester and the attackers the attackers can pretendto be legal requesters without being detected

In this section we first give an overview of thedata user authentication protocol Then we introducehow to achieve secure and efficient data user authen-tication Finally we demonstrate how to detect illegalsearches and how to enable secure and efficient datauser revocation

41 Overview

Now we give an example to illustrate the main idea of the user authentication protocol(the detailed protocolis elaborated in the following subsections) AssumeAlice wants to be authenticated by the administrationserver so she starts a conversation with the serverThe server then authenticates the contents of theconversation If the contents are authenticated bothAlice and the server will generate the initial secret key

according to the conversation contents After the ini-tialization to be authenticated successfully Alice hasto provide the historical data of their conversationsIf the authentication is successful both Alice and theadministration server will change their secret keys ac-cording the contents of the conversation In this waythe secret keys keep changing dynamically withoutknowing the correct historical data an attacker cannotstart a successful conversation with the administrationserver

42 User AuthenticationBefore we introduce the dynamic key generationmethod and the authentication protocol we first intro-duce the format of the authentication data As shown

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 5140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 5

Fig 2 Format of Authentication Data

in Fig 2 the authentication data consists of five

parts The request counter field records the number of search requests that the data user has submitted Thelast request time field asks the data user to providethe historical data of his previous request time Thepersonally identifiable data (eg passport numbertelephone number) field is used to identify a specificdata user while the random number and CRC fieldare further used to check whether the authenticationdata has been tampered with

The key point of a successful authentication is toprovide both the dynamically changing secret keysand the historical data of the corresponding data

user Let kij denotes the secret key shared betweenadministration server and the jth data user U j afteri instances of search requests and dij denotes theauthentication data for the (i+1)th request of U j Ourauthentication protocol runs in the following six steps

A Data user U j prepares his authentication datadij ie U j needs to fill in all the fields of authentica-tion data based on his historical data

B Data user U j encrypts dij with the current secretkey kij and submits the encrypted authenticationdata (dij)kij to the administration server

C After submitting the authentication data thedata user U j generates another secret key ki+1j =kij oplus H (dij) and stores both kij and ki+1j

D Upon receiving U j rsquos encrypted authenticationdata the administration server decrypts it with kij

E The administration server checks the requestcounter last request time personally identifiable dataand CRC respectively If the authentication succeedsthe administration server first generates a new secretkey ki+1j = kij oplus H (dij) then he replies a confirma-tion data di+1j and encrypts it with ki+1j Otherwisethe administration server encrypts di+1j with secret

key kij F Upon receiving a reply from the administrationserver the data user U j will try to decrypt it withki+1j If the decrypted data contains the confirmationdata the authentication is successful Otherwise theauthentication is regarded as being unsuccessful Thedata user deletes the new generated secret key ki+1j

and considers whether to start another authenticationFig 3 shows an example of successful authen-

tication between the administration server and thedata user As we can see after each successful au-thentication process the secret key will be changed

dynamically according to the previous key and somehistorical data Therefore once an attacker steals a se-cret key he can hardly get any benefits On one handif the attacker knows nothing about the historical data

Administration

Server

Data

User U i

0

0i

ik

d

1

1i

ik

d

2

2i

ik

d

33

ii

k d

0ik

1 0 0i i ik k H d

2 1 1i i ik k H d

3 2 2i i ik k H d

4 3 3i i ik k H d

0ik

1 0 0i i ik k H d

2 1 1i i ik k H d

3 2 2i i ik k H d

4 3 3i i ik k H d

Fig 3 Example of data user authentication and dy-namic secret key generation

of the legal data user he cannot even construct a legalauthentication data On the other hand if the legaldata user performs another successful authentication

the previous secret key will be expired

43 Illegal Search Detection

In our scheme the authentication process is protected by the dynamic secret key and the historical infor-mation We assume that an attacker has successfullyeavesdropped the secret key k0j of U j Then he hasto construct the authentication data if the attackerhas not successfully eavesdropped the historical da-ta eg the request counter the last request timehe cannot construct the correct authentication dataTherefore this illegal action will soon be detected bythe administration server Further if the attacker hassuccessfully eavesdropped all data of U j the attackercan correctly construct the authentication data andpretend himself to be U j without being detected bythe administration server However once the legaldata user U j performs his search since the secret keyon the administration server side has changed therewill be contradictory secret keys between the admin-istration server and the legal data user Therefore thedata user and administration server will soon detectthis illegal action

44 Data User Revocation

Different from previous works data user revocationin our scheme does not need to re-encrypt and updatelarge amounts of data stored on the cloud server In-stead the adminstration server only needs to updatethe secret data S a stored on the cloud server As will be detailed in the next section S a = gka1middotka2middotra whereka1 and ka2 are the secret keys of the administrationserver and ra is randomly generated for every updateoperation Consequently the previous trapdoors will be expired Additionally without the help of the

administration server the revoked data user cannotgenerate the correct trapdoor T whprime Therefore a datauser cannot perform correct searches once he is re-voked

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 6140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 6

5 MATCHING DIFFERENT-KE Y ENCRYPTED

KEYWORDS

Numerous data owners are often involved in practicalcloud applications For privacy concerns they would be reluctant to share secret keys with others Insteadthey prefer to use their own secret keys to encrypt

their sensitive data (keywords files) When keywordsof different data owners are encrypted with differentsecret keys the coming question is how to locatedifferent-key encrypted keywords among multiple da-ta owners In this section to enable secure efficientand convenient searches over encrypted cloud dataowned by multiple data owners we systematicallydesign schemes to achieve the following three require-ments First different data owners use different secretkeys to encrypt their keywords Second authenticateddata users can generate their trapdoors without know-ing these secret keys Third upon receiving trapdoors

the cloud server can find the corresponding keyword-s from different data ownersrsquo encrypted keywordswithout knowing the actual value of keywords ortrapdoors

51 Overview

Now we present an example to illustrate the mainidea of our keywords matching protocol(the detailedprotocol is elaborated in the following subsections)Assume Alice wants to use the cloud to store her fileF she first encrypts her file F and gets the ciphertextC To enable other users to perform secure searcheson C Alice extracts a keyword wih and sends theencrypted keyword wih = (E aprime E o) to the adminis-tration server The administration server further re-encrypts E aprime to E a and submits wih = (E a E o) to thecloud server Now Bob wants to search a keyword whprime he first generates the trapdoor T primewhprime and submits it tothe administration server The administration serverre-encrypts T primewhprime to T whprime = (T 1 T 2 T 3) generates asecret data S a and submits T whprime S a to the cloudserver The cloud server will judge whether Bobrsquossearch request matches Alicersquos encrypted keyword by

checking whether e (E a

T 3

) = e (E o

T 1

) middot e (S a

T 2

)holds

52 Construction Initialization

Our construction is based on the aforementioned bi-linear map Let g and g1 denote the generator of twocyclic groups G and G1 with order p Let e be a bilinear map e G times G rarr G1 Given different secretparameters as input a randomized key generationalgorithm will output the private keys used in thesystem ka1 isin Z

+ p ka2 isin Z

+ p kif isin Z

+ p kiw isin Z

+ p larr

(0 1)lowast where ka1 and ka2 are the private keys of the

administration server kiw and kif are the privatekeys used to encrypt keywords and files of data ownerOi respectively Let H (middot) be a public hash function itsoutput locates in Z

+ p

53 Keyword Encryption

For keyword encryption the following conditionsshould be satisfied first different data owners usetheir own secret keys to encrypt keywords Secondfor the same keyword it would be encrypted to dif-ferent cipher-texts each time These properties benefit

our scheme for two reasons First losing the key of one data owner would not lead to the disclosure of other ownersrsquo data Second the cloud server cannotsee any relationship among encrypted keywords Giv-en the hth keyword of data owner Oi ie wih weencrypt wih as follows

wih =983080

gkiwmiddotromiddotH (wih) gkiwmiddotro983081

(1)

where ro is a randomly generated number eachtime which helps enhance the security of wih Foreasy description and understanding we let E primea =

gkiwmiddotromiddotH (wih)

and E o = gkiwmiddotro

The data owner delivers E aprime and E o to the admin-

istration server and the administration server furtherre-encrypts E aprime with his secret keys ka1 and ka2 andgets E a

E a =852008

E aprime middot gka1852009ka2 (2)

Therefore wih = (E a E o) The administrative serv-er further submits wih to the cloud server Note thatsince the administration server only does simple com-putations on the encrypted data he cannot learn anysensitive information from these random encrypted

data without knowing the secret keys of data owners

54 Trapdoor Generation

To make the data users generate trapdoors securelyconveniently and efficiently our proposed schemeshould satisfy two main conditions First the datauser does not need to ask a large amount of dataowners for secret keys to generate trapdoors Sec-ond for the same keyword the trapdoor generatedeach time should be different To meet this conditionthe trapdoor generation is conducted in two steps

First the data user generates trapdoors based on hissearch keyword and a random number Second theadministration server re-encrypts the trapdoors forthe authenticated data user

Assume a data user wants to search keyword whprime so he encrypts it as follows

T primewhprime =983080

gH (whprime )middotru gru983081

(3)

where ru is a randomly generated number eachtime As we can see during the trapdoor generationprocess secret keys of data owners are not required

Additionally with the help of random variable ru forthe same keyword whprime we can generate two differenttrapdoors which prevent attackers from knowing therelationship among trapdoors

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 7140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 7

Upon receiving T primewhprime the administration server firstgenerates a random number ra and then re-encryptsT primewhprime as follows

T whprime =983080

gH (whprime )middotrumiddotka1middotka2middotra grumiddotka1 grumiddotka1middotra983081

(4)

For easy description and understanding we let

T 1 = g

H (whprime )middotrumiddotka1middotka2middotra

T 2 = g

rumiddotka1

T 3 = g

rumiddotka1middotra

hence T whprime = (T 1 T 2 T 3) Finally the administrationserver submits T whprime to the cloud server

55 Keywords Matching among Different DataOwners

The cloud server stores all encrypted files and key-words of different data owners The administrationserver will also store a secret data S a = gka1middotka2middotra

on the cloud server Upon receiving a query requestthe cloud will search over the data of all these dataowners The cloud processes the search request in two

steps First the cloud matches the queried keywordsfrom all keywords stored on it and it gets a candidatefile set Second the cloud ranks files in the candidatefile set and finds the most top-k relevant files Weintroduce the matching strategy here while leavingthe task of introducing the ranking strategy in the nextsection When the cloud obtains the trapdoor T whprime andencrypted keywords (E o E a) he first computes

e (S a T 2)

= e852008

gramiddotka1middotka2 grumiddotka1852009

(5)

= e(g g)ramiddotka1middotka2middotrumiddotka1

Then he can judge whether whprime = wih (ie anencrypted keyword is located) holds if the followingequation is true

e (E a T 3)

= e

1048616983080gkiwmiddotromiddotH (wih) middot gka1

983081ka2 grumiddotka1middotra

1048617= e(g g)

(kiw middotromiddotH (wih)+ka1)middotka2middotrumiddotka1middotra (6)

= e(g g)kiwmiddotromiddotH (wih)middotka2middotrumiddotka1middotra middot e (S a T 2)

= e983080

gkiwmiddotro gH (wih)middotka2middotrumiddotka1middotra983081

middot e (S a T 2)

= e (E o T 1) middot e (S a T 2)

6 PRIVACY PRESERVING RANKED SEARCH

The aforementioned section helps the cloud matchthe queried keywords and obtain a candidate fileset However we cannot simply return undifferentialfiles to data users for the following two reasons Firstreturning all candidate files would cause abundantcommunication overhead for the whole system Sec-ond data users would only concern the top-k relevantfiles corresponding to their queries In this sectionwe first elucidate an order and privacy preserving

encoding scheme Then we illustrate an additive orderpreserving and privacy preserving encoding schemeFinally we apply the proposed scheme to encode therelevance scores and obtain the top-k search results

x 1 2 3 4 5

f ( x) 100-1000 1100-1800 2000-4200 4300-5000 5100-7000

Fig 4 An example of Order Preserving and PrivacyPreserving Function

61 Order and Privacy Preserving Function

To rank the relevance score while preserving its priva-cy the proposed function should satisfy the followingconditions 1) This function should preserve the orderof data as this helps the cloud server determine whichfile is more relevant to a certain keyword according tothe encoded relevance scores 2) This function shouldnot be revealed by the cloud server so that cloudserver can make comparisons on encoded relevancescores without knowing their actual values 3) Dif-ferent data owners should have different functionssuch that revealing the encoded value of a data ownerwould not lead to the leakage of encoded values of other data owners In order to satisfy condition 1we introduce a data processing part m(x middot) whichpreserves the order of x To satisfy condition 2 weintroduce a disturbing part rf which helps prevent thecloud server from revealing this function To satisfycondition 3 we use m(x middot) to process the ID of dataowners So this function belongs to the followingfunction family

F yoppf (x) =991761

0lejkleτ

Ajk middot m(x j) middot m(y k) + rf (7)

where τ denotes the degree of F yoppf (x) and Ajkdenotes the coefficients of m(x j) middot m(y k) We furtherdefine m(x j) as follows m(x 0) = 1 m(x 1) = xand m(x j) = (m(x jminus1)+α middot x) middot (1+λ) if jgt1 whereα and λ are two constant numbers

Now we introduce how to set the disturbing partrf Since

sum0lejkleτ Ajkmiddot(m(x+1 j)minusm(x j))middotm(y k)gesum

0lejkleτ Ajkmiddot((1+λ)jminus1

+ αmiddotsum

1leilejminus2(1+λ))

Let l be an integer such that2lminus1le

sum0lejkleτ Ajkmiddot((1+λ)

jminus1+αmiddot

sum1leilejminus2(1+λ))le2l

we can set rf isin (0 2lminus1)Obviously forallx1gtx2 we have F yoppf (x1)gtF yoppf (x2)

62 Additive Order and Privacy Preserving Func-tion

With the order and privacy preserving function wehave if x1 gt x2 then y1 gt y2 However the additionof the function is not necessarily order preserving

An example is presented in Fig 4 Obviously f (x)

is order preserving to preserve privacy a disturbingpart is introduced As we can see f (1) + f (5) variesfrom 5200 to 8000 f (1) + f (4) varies from 4400 to6000 f (2) + f (3) varies from 3100 to 6000 Though

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 8140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 8

1 + 5 gt 1 + 4 and 1 + 5 gt 2 + 3 it is probable forf (1)+f (5) lt f (1)+f (4) and f (1)+f (5) lt f (2)+f (3)

To correctly perform the secure ranked multi-keyword search the sum of any two encoded rel-evance scores should still be ordered and privacypreserved (we use the sum of encoded relevance scoreto evaluate the relevance between a file and multiplekeywords in this paper) To satisfy this conditionwe further design an additive order and privacypreserving function family based on Eq 7

F yaoppf (x) =991761

0lejkleτ

Ajk middot m(x j) middot m(y k) + raof (8)

Where τ denotes the degree of F yaoppf (x) and Ajkdenotes the coefficients of m(x j) middot m(y k)

Before we take a step further towards the useof F yaoppf (x) we will first introduce our auxiliarytheorem ie how to make an order and privacypreserving function to be additive order and privacypreserving

Definition 3 Given an order preserving functiony = f (x) we define ∆f (xi) = f (xi+1) minus f (xi) and∆f (xi) = ∆f (xi+1) minus ∆f (xi) To preserve privacy wedefine yi = f (xi) + ri where ri is a random numberand ri gt 0 We define the max value of ri as rimaxand rimax = f (xi)minusimiddot∆(xi) hence yi can change fromf (xi) to f (xi) + rimax

Theorem 1 Given an order preserving function yi =f (xi) + ri when the following three conditionsare satisfied (1) foralli ∆f (xi + 1) le ∆f (xi) (2) foralli

˜∆f (xi + 1) ge ˜

∆f (xi) (3) foralli ri lt 983080i

2

middot ˜∆f (xi)983081 2The order and privacy preserving function yi is also

an additive order and privacy preserving that isfor any

sumxiisinD1

xi le sum

xjisinD2xj where D1 and D2

denotes two subsets of the definition domain we havesumxiisinD1

yi lesum

xjisinD2yj Proof The proof is elaborated in Appendix A

63 Encoding relevance scores

In our paper we use a well-known method to com-pute the relevance score [26] ie Score(w F d) =1

|F d|(1+ln f dw)ln(1+ N

f w) where w denotes the given

keyword |F d| is the length of file F d f dw denotes theT F of w in file F d f w denotes the number of filescontaining w and N denotes the total number of filesin the collection

Now we introduce how to encode the relevancescore with an additive order and privacy preservingfunction in our additive order and privacy preservingfunction family F yaoppf (x) Given input S ijt (the rel-evance score of tth keyword to jth file of ith dataowner) Data owners can independently choose afunction from this family to protect the privacy of their relevance scores For simplicity in this paper we

specify data owner i to choose F H i(i)aoppf (middot) and defineV ijt as the encoded data of S ijt

V ijt = F H i(i)aoppf (S ijt) (9)

where H i(middot) is a public known hash function and i isthe I D of data owner Oi

With the well-designed properties of F aoppf thecloud server can make a comparison among encodedrelevance scores for the same data owner Howeversince different data owners encode their relevancescores with different functions in F aoppf the cloudserver cannot make a comparison between encodedrelevance scores for different data owners To solvethis problem we define

T ijt(y) = F yaoppf (S ijt) (10)

where T ijt(y) is used to help the cloud servermake comparisons among relevance scores encoded by different data owners and y is a variable whichtakes the hash value of data ownerrsquos ID as input

Finally we attach each V ijt with a T ijt(y)

64 Ranking search results

In this paper we use the sum of the relevance scoresas the metric to rank search results Now we intro-duce the strategies of ranking search results based onthe encoded relevance scores First the cloud com-putes V ij =

sumtisinW

V ijt the sum of encoded relevance

scores between the jth file and matched keywords forOi and the auxiliary value T ij(y) =

sumtisinW

T ijt(y) Then

the cloud ranks the sum of encoded relevance scorewith the following two conditions

(1) Two encoded data belong to the same dataowner Given that a data user issues a query W =wm wn we assume that Oirsquos F 1 and F 2 satisfy thequery Then the cloud adds the encoded relevancescore together and gets the relevance score of Oiprimes F 1to W

V i1 = V i1m + V i1n (11)

Similarly

V i2 = V i2m + V i2n (12)

If V i1 ge V i2 then F 1 is more relevant to the queried

keyword sets W otherwise F 2 is more relevant to W (2) Two encoded data belong to two different data

owners Given that a data user issues a query W =wm wn we assume Oirsquos F 1 and Oj rsquos F 2 satisfy thequery The cloud server makes comparison on theirencoded relevance scores in the following four steps

First the cloud server computes the relevance score

of Oiprimes F 1 to W

V i1 = V i1m + V i1n (13)

Second the cloud server computes the T j2(y)

T j2(y) = T j2m(y) + T j2n(y) (14)

Third the cloud server substitutes H i(i) for thevariable y and gets T j2(H (i))

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 9140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 9

C 11

C 12

C 21

C 23

O1 O2

H (1)

Fig 5 Example of ranking search results

Finally the cloud draws the conclusion if V i1 ge

T j2(H i(i)) then Oiprimes F 1 is more relevant to W thanOj primes F 2 otherwise Oj primes F 2 is more relevant to W

Now the cloud server can make comparisons onencoded relevance scores Thus it is easy for the cloudto return the top-k relevant files to the data user

Fig 5 shows an example of making comparisons onthe encoded relevance scores As we can see giventhe trapdoor T w1 the cloud server finds that w11 andw21 match T w1 Data owner O1 has two encryptedfiles C 11 and C 12 that contain w11 Data ownerO2 also has two encrypted files C 21 and C 23 thatcontain w21 For O1 he compares V 111 with V 121 if V 111 gt V 121 the cloud server regards C 11 as beingmore relevant to T w1 otherwise C 21 is more relevantto T w1 We assume C 12 is more relevant to T w1 for O1

and C 21 is more relevant for O2 then the cloud makescomparisons between C 12 and C 21 As shown in thefigure the cloud first substitutes H (1) to T 211(y) andgets T 211(H (1)) then he compares T 211(H (1)) withV 121 if V 121 gt T 211(H (1)) Then O1rsquos encrypted fileC 12 is more relevant to T w1 otherwise O2rsquos encryptedfile C 21 is more relevant to the search request

7 SECURITY ANALYSIS

In this section we provide step-by-step security anal-yses to demonstrate that the security requirementshave been satisfied for the data files the keywordsthe queries and the relevance scores

71 Data Files

The data files are protected by symmetric encryption before upload As long as the encryption algorithm isnot breakable the cloud server cannot know the data

72 Keywords

We formulate the security goals achieved by PRMSMwith the following two theorems

Theorem 2 Given the DDH (Decisional DiffieHell-man) assumption PRMSM is semantically secure a-gainst the chosen keyword attack under the selectivesecurity modelProof See Appendix B

Theorem 3 Given the DL (Discrete Logarithm) as-sumption PRMSM achieves keyword secrecy in therandom oracle modelProof See Appendix C

73 Trapdoors

Recall the trapdoor construction formula

T whprime =983080

gH (whprime )middotrumiddotka1middotka2middotra grumiddotka1 grumiddotka1middotra983081

(15)

If the cloud server wants to know the actual valueof the trapdoor and distinguish two trapdoors it has

to solve the discrete logarithm problem in Z p withlarge prime p therefore the privacy of trapdoor isprotected as long as the discrete logarithm problem ishard

74 Relevance Scores

In our scheme relevance scores are encoded with twoAdditive Order and Privacy Preserving FunctionsNow we analyze the security of additive order andprivacy preserving functions Assume the input for

the F yaopp

(x) is

s and the data owner ID is

i Thenthe cloud server can only capture the following value

that is derived from s

F (H i(i))aopp (s) =991761

0lejkleτ

Ajk middotm(s j)middotm(H i(i) k)+raof

(16)Assume the cloud server has collected n encod-

ed relevance scores for the same relevance score s(F

(H i(i))aopp (s)) then he can construct n equations How-

ever these functions have n + 1 unknown variablesTherefore it is infeasible for the cloud server to breakthe additive order and privacy preserving F yaopp(x)

Thus the security of the corresponding F yaopp(x) en-coded relevance score is also preserved

8 PERFORMANCE EVALUATION

In this section we measure the efficiency of PRMSMand compare it with its previous version SecureRanked Multi-keyword Search for Multiple data own-ers in cloud computing (SRMSM) [27] and the state-of-the-art privacy-preserving Multi-keyword RankedSearch over Encrypted cloud data (MRSE) [11] side

by side Since MRSE is only suitable for the singleowner model our PRMSM and SRMSM not onlywork well in multi-owner settings but also outper-form MRSE on many aspects

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 10140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 10

Number of files in the dataset (times103)

T i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(a) For different size datasetswith the same dictionary u =4000

Number of keywords in dictionary(times103) T

i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(b) For the same dataset withdifferent dictionary sizes n =1000

T i m e o f E n c o d i n g y ( s )

Number of keywords in dictionary (times103)

(c) Time cost of encoding rel-evance score (a subroutine of

building index)

Fig 6 Time cost of index construction

Number of keywords in dictionary(times103) T i m e o f g e n e r a t i n g t r a p d o o r ( s )

(a) For different size of key-word dictionary with the samenumber of queried keywordsq = 100

Number of queried keywords (times102) T i m e o f g e

n e r a t i n g t r a p d o o r ( times 1 0 - 2 s )

(b) For different number of queried keywords with thesame size of keyword dictio-nary u = 4000

Fig 7 Time cost of generating trapdoors

81 Evaluation Settings

We conduct performance experiments on a real data

set the Internet Request For Comments dataset (RFC)[28] We use Hermetic Word Frequency Counter [29]to extract keywords from each RFC file After thekeyword extraction we compute keyword statisticssuch as the keyword frequency in each file the lengthof each file the number of files containing a specifickeyword etc We further calculate the relevance scoreof a keyword to a file based on these statistics Thefile size and keyword frequency of this data set can be seen in [20]

The experiment programs are coded using thePython programming language on a PC with 22GHZ

Intel Core CPU and 2GB memory We implementall necessary routines for data owners to preprocessdata files for the data user to generate trapdoorsfor the administrative server to re-encrypt keywordstrapdoors and for the cloud server to perform rankedsearches We use the Weil pairing [30] to construct our bilinear map

82 Evaluation Results

821 Index Construction

Fig 6(a) shows that given the same keyword dictio-

nary (u=4000) time of index construction for theseschemes increases linearly with an increasing numberof files while SRMSM and PRMSM spend much lesstime on index construction Fig 6(b) demonstrates

that given the same number of files (n=1000) SRMSMand PRMSM consume much less time than MRSE

on constructing indexes Additionally SRMSM andPRMSM are insensitive to the size of the keyworddictionary for index construction while MRSE suf-fers a quadratic growth with the size of keyworddictionary increases Fig 6(c) shows the encodingefficiency of our proposed AOPPF The time spent onencoding increases from 01s to 1s when the numberof keywords increases from 1000 to 10000 This timecost can be acceptable

822 Trapdoor Generation

Compared with index construction trapdoor genera-

tion consumes relatively less time Fig 7(a) demon-strates that given the same number of queried key-words (q=100) SRMSM and PRMSM are insensitive tothe size of keyword dictionary on trapdoor generationand consumes 0026s and 0031s respectively Mean-while MRSE increases from 004s to 62s Fig 7(b)shows that given the same number of dictionary size(u=4000) when the number of queried keywords in-creases from 100 to 1000 the trapdoor generation timefor MRSE is 031s and remains unchanged WhileSRMSM increases from 0024s to 025s PRMSM in-creases from 0031s to 031s We observe that PRMSM

spends a little more time than SRMSM on trapdoorgeneration the reason is that PRMSM introduces anadditional variable to ensure the randomness of trap-doors

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 11140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 11

T

i m e c o s t o f r e - e n c r y p t i o n ( s )

Average number of keywords per owner

(a) Time cost of re-encryptingkeywords

Average number of trapdoors per user

T i m e c o s t o f r e - e n c r y p t i o n ( s )

(b) Time cost of re-encryptingtrapdoors

Fig 8 Time cost of the administration server

Number of queried keywords

T i m e o f q u e r y ( s )

(a) For different size of queriedkeywords with the same sizeof data set n=2000 and key-word dictionary u = 100

Number of files in the dataset (times103)

T i m e o f q u e r y ( s )

(b) For different size of data setwith the same size of queriedkeywords q=5 and keyworddictionary u = 100

Number of keywords in dictionary (times102

T i m e o f q u e r y ( s )

(c) For different size of key-word dictionary with the samesize of queried keywords q=5and data set n = 2000

Fig 9 Time cost of search

823 Re-encryption by the administration server

Fig 8(a) illustrates the re-encryption time cost of the

administration server in PRMSM As we can see forthe same average number of keywords per owner themore data owners are involved the more time is spenton re-encryption When there are 300 data ownerseach data owner has 100 keywords we need 38s tore-encrypt these keywords which is acceptable

Fig 8(b) demonstrates the time cost of re-encryptingtrapdoors We observe that for the same averagenumber of trapdoors per user the more data usersthat submit trapdoors the more time would be spenton re-encryption When there are 1000 data userswho concurrently submit data each data user has 10

trapdoors we only need 334s for re-encryption

824 Search

From Fig 9 we observe that PRMSM spends moretime for searching The fundamental reason is thatthe pairing operation used in PRMSM needs moretime As we can see from Fig 9(a) and Fig 9(c)the more keywords existing in the cloud server themore time is required for pairing operation Fig 9(b)confirms that when the number of keywords stored onthe cloud server remains a constant PRMSM will not

increase even if the number of files increases ThoughPRMSM spends relatively more time this observationalso confirms that the searching operation should beoutsourced to the cloud server

9 RELATED WOR K

In this section we review three categories of worksearchable encryption secure keyword search in cloudcomputing and order preserving encryption

91 Searchable Encryption

The earliest attempt of searchable encryption wasmade by Song et al In [3] they propose to encrypteach word in a file independently and allow the serverto find whether a single queried keyword is containedin the file without knowing the exact word Thisproposal is more of theoretic interests because of highcomputational costs Goh et al propose building akeyword index for each file and using Bloom filter

to accelerate the search [4] Curtmola et al propose building indices for each keyword and use hashtables as an alternative approach to searchable en-cryption [5] The first public key scheme for keywordsearch over encrypted data is presented in [6] [7] and[8] further enrich the search functionalities of search-able encryption by proposing schemes for conjunctivekeyword search

The searchable encryption cares mostly about singlekeyword search or boolean keyword search Extend-ing these techniques for ranked multi-keyword searchwill incur heavy computation and storage costs

92 Secure Keyword Search in Cloud Computing

The privacy concerns in cloud computing motivatethe study on secure keyword search Wang et al

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 12140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 12

first defined and solved the secure ranked keywordsearch over encrypted cloud data In [9] and [18] theyproposed a scheme that returns the top-k relevant filesupon a single keyword search Cao et al [10] [11]and Sun et al [31] [12] extended the secure keywordsearch for multi-keyword queries Their approachesvectorize the list of keywords and apply matrix mul-tiplications to hide the actual keyword informationfrom the cloud server while still allowing the serverto find out the top-k relevant data files Xu et alproposed MKQE (Multi-Keyword ranked Query onEncrypted data) that enables a dynamic keyword dic-tionary and avoids the ranking order being distorted by several high frequency keywords [13] Li et al[14] Chuah et al [15] Xu et al [16] and Wang et al[17] proposed fuzzy keyword search over encryptedcloud data aiming at tolerance of both minor typosand format inconsistencies for usersrsquo search input [19]further proposed privacy-assured similarity searchmechanisms over outsourced cloud data In [20] weproposed a secure efficient and distributed keywordsearch protocol in the geo-distributed cloud environ-ment

The system model of these previous works onlyconsider one data owner which implies that in theirsolutions the data owner and data users can easilycommunicate and exchange secret information Whennumerous data owners are involved in the systemsecret information exchanging will cause considerablecommunication overhead Sun et al [21] and Zheng

et al [22] proposed secure attribute-based keywordsearch schemes in the challenging scenario wheremultiple owners are involved However applying CP-ABE in the cloud system would introduce problemsfor data user revocation ie the cloud has to updatethe large amount of data stored on it for a data userrevocation [32] Additionally they do not supportprivacy preserving ranked multi-keyword search Ourpaper differs from previous studies regarding the em-phasis of multiple data owners in the system modelThis paper seeks a solution scheme to maximally relaxthe requirements for data owners and users so that

the scheme could be suitable for a large number of cloud computing users

93 Order Preserving Encryption

The order preserving encryption is used to preventthe cloud server from knowing the exact relevancescores of keywords to a data file The early work of Agrawal et al proposed an Order Preserving sym-metric Encryption (OPE) scheme where the numericalorder of plain texts are preserved [33] Boldyreva etal further introduced a modular order preserving

encryption in [34] Yi et al [35] proposed an order p-reserving function to encode data in sensor networksPopa et al [36] recently proposed an ideal-secureorder-preserving encryption scheme Kerschbaum et

al [37] further proposed a scheme which is not onlyidea-secure but is also an efficient order-preservingencryption scheme However these schemes are notadditive order preserving As a complementary workto the previous order preserving work we propose anew additive order and privacy preserving functions(AOPPF) Data owners can freely choose any functionfrom an AOPPF family to encode their relevancescores The cloud server computes the sum of encodedrelevance scores and ranks them based on the sum

10 CONCLUSIONS

In this paper we explore the problem of securemulti-keyword search for multiple data owners andmultiple data users in the cloud computing envi-ronment Different from prior works our schemesenable authenticated data users to achieve secureconvenient and efficient searches over multiple dataownersrsquo data To efficiently authenticate data usersand detect attackers who steal the secret key andperform illegal searches we propose a novel dynamicsecret key generation protocol and a new data userauthentication protocol To enable the cloud server toperform secure search among multiple ownersrsquo dataencrypted with different secret keys we systematical-ly construct a novel secure search protocol To rank thesearch results and preserve the privacy of relevancescores between keywords and files we propose anovel Additive Order and Privacy Preserving Func-tion family Moreover we show that our approachis computationally efficient even for large data andkeyword sets As our future work on one hand wewill consider the problem of secure fuzzy keywordsearch in a multi-owner paradigm On the other handwe plan to implement our scheme on the commercialclouds

ACKNOWLEDGMENTS

This work is supported in part by the National Natu-ral Science Foundation of China (Project No 6117303861472125 61300217) and the China Scholarship Coun-cil A preliminary version of this paper appearedin the Proceedings of the IEEEIFIP InternationalConference on Dependable Systems and Networks(DSNrsquo14) [27]

REFERENCES

[1] M Armbrust A Fox R Griffith A D Joseph R KatzA Konwinski G Lee D Patterson A Rabkin I Stoica andM Zaharia ldquoA view of cloud computingrdquo Communication of the ACM vol 53 no 4 pp 50ndash58 2010

[2] C Wang S S Chow Q Wang K Ren and W Lou ldquoPrivacy-preserving public auditing for secure cloud storagerdquo Comput-

ers IEEE Transactions on vol 62 no 2 pp 362ndash375 2013[3] DSong DWagner and APerrig ldquoPractical techniques for

searches on encrypted datardquo in Proc IEEE International Sym- posium on Security and Privacy (SampPrsquo00) Nagoya Japan Jan2000 pp 44ndash55

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 13140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 13

[4] E Goh (2003) Secure indexes [Online] Availablehttpeprintiacrorg

[5] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proc ACM CCSrsquo06 VA USA Oct 2006 pp79ndash88

[6] D B et al ldquoPublic key encryption with keyword search secureagainst keyword guessing attacks without random oraclerdquoEUROCRYPT vol 43 pp 506ndash522 2004

[7] P Golle J Staddon and B Waters ldquoSecure conjunctive key-word search over encrypted datardquo in Proc Applied Cryptogra-

phy and Network Security (ACNSrsquo04) Yellow Mountain China Jun 2004 pp 31ndash45

[8] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo in ProcInformation and Communications Security (ICICSrsquo05) BeijingChina Dec 2005 pp 414ndash426

[9] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo in Proc IEEEDistributed Computing Systems (ICDCSrsquo10) Genoa Italy Jun2010 pp 253ndash262

[10] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo in Proc IEEE INFOCOMrsquo11 Shanghai China Apr 2011pp 829ndash837

[11] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo Parallel and Distributed Systems IEEE Transactions onvol 25 no 1 pp 222ndash233 2014

[12] W Sun B Wang N Cao M Li W Lou Y T Hou and H LildquoVerifiable privacy-preserving multi-keyword text search inthe cloud supporting similarity-based rankingrdquo Parallel andDistributed Systems IEEE Transactions on vol 25 no 11 pp3025ndash3035 2014

[13] Z Xu W Kang R Li K Yow and C Xu ldquoEfficient multi-keyword ranked query on encrypted data in the cloudrdquo inProc IEEE Parallel and Distributed Systems (ICPADSrsquo12) Singa-pore Dec 2012 pp 244ndash251

[14] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo in

Proc IEEE INFOCOMrsquo10 San Diego CA Mar 2010 pp 1ndash5[15] M Chuah and W Hu ldquoPrivacy-aware bedtree based solutionfor fuzzy multi-keyword search over encrypted datardquo in ProcIEEE 31th International Conference on Distributed ComputingSystems (ICDCSrsquo11) Minneapolis MN Jun 2011 pp 383ndash392

[16] P Xu H Jin Q Wu and W Wang ldquoPublic-key encryptionwith fuzzy keyword search A provably secure scheme underkeyword guessing attackrdquo Computers IEEE Transactions onvol 62 no 11 pp 2266ndash2277 2013

[17] B Wang S Yu W Lou and Y T Hou ldquoPrivacy-preservingmulti-keyword fuzzy search over encrypted data in thecloudrdquo in IEEE INFOCOM Toronto Canada May 2014 pp2112ndash2120

[18] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoParallel and Distributed Systems IEEE Transactions on vol 23

no 8 pp 1467ndash1479 2012[19] C Wang K Ren S Yu and K M R Urs ldquoAchieving usableand privacy-assured similarity search over outsourced clouddatardquo in Proc IEEE INFOCOMrsquo12 Orlando FL Mar 2012 pp451ndash459

[20] W Zhang Y Lin S Xiao Q Liu and T Zhou ldquoSecuredistributed keyword search in multiple cloudsrdquo in ProcIEEEACM IWQOSrsquo14 Hongkong May 2014 pp 370ndash379

[21] W Sun S Yu W Lou Y T Hou and H Li ldquoProtectingyour right Attribute-based keyword search with fine-grainedowner-enforced search authorization in the cloudrdquo in ProcIEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 226ndash234

[22] Q Zheng S Xu and G Ateniese ldquoVabks Verifiable attribute- based keyword search over outsourced encrypted datardquo inProc IEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 522ndash530

[23] T Jung X Y Li Z Wan and M Wan ldquoPrivacy preservingcloud data access with multi-authoritiesrdquo in Proc IEEE INFO-COMrsquo13 Turin Italy Apr 2013 pp 2625ndash2633

[24] Q Liu C C Tan J Wu and G Wang ldquoEfficient informationretrieval for ranked queries in cost-effective cloud environ-

mentsrdquo in INFOCOM 2012 Proceedings IEEE IEEE 2012 pp2581ndash2585

[25] B Chor E Kushilevitz O Goldreich and M Sudan ldquoPrivateinformation retrievalrdquo Journal of the ACM vol 45 no 6 pp965ndash981 1998

[26] I H Witten A Moffat and T C Bell Managing gigabytesCompressing and indexing documents and images San FranciscoUSA Morgan Kaufmann 1999

[27] W Zhang S Xiao Y Lin T Zhou and S Zhou ldquoSecure ranked

multi-keyword search for multiple data owners in cloud com-putingrdquo in Proc 44th Annual IEEEIFIP International Conferenceon Dependable Systems and Networks (DSN2014) Atlanta USAIEEE jun 2014 pp 276ndash286

[28] IETF ldquoRequest for comments databaserdquo [Online] Availablehttpwwwietforgrfchtml

[29] H Systems ldquoHermetic word frequency counterrdquo [Online]Available httpwwwhermeticchwfcwfchtm

[30] D Boneh and M Franklin ldquoIdentity-based encryption fromthe weil pairingrdquo SIAM Journal on Computing vol 32 no 3pp 586ndash615 2003

[31] W Sun B Wang N Cao M Li W Lou Y T Hou andH Li ldquoPrivacy-preserving multi-keyword text search in thecloud supporting similarity-based rankingrdquo in Proc IEEE

ASIACCSrsquo13 Hangzhou China May 2013 pp 71ndash81[32] J Hur ldquoImproving security and efficiency in attribute-based

data sharingrdquo IEEE TRANSACTIONS ON KNOWLEDGE ANDDATA ENGINEERING vol 25 no 10 pp 2271ndash2282 2013

[33] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preserv-ing encryption for numeric datardquo in Proc ACM SIGMODrsquo04Paris France Jun 2004 pp 563ndash574

[34] A Boldyreva N Chenette Y Lee and A O ldquoOrder-preserving encryption revisited Improved security analysisand alternative solutionsrdquo in Proc Advances in Cryptology(CRYPTOrsquo11) California USA Aug 2011 pp 578ndash595

[35] Y Yi R Li F Chen A X Liu and Y Lin ldquoA digitalwatermarking approach to secure and precise range queryprocessing in sensor networksrdquo in Proc IEEE INFOCOMrsquo13Turin Italy Apr 2013 pp 1950ndash1958

[36] R A Popa F H Li and N Zeldovich ldquoAn ideal-security pro-tocol for order-preserving encodingrdquo in Security and Privacy

(SP) 2013 IEEE Symposium on IEEE 2013 pp 463ndash477[37] F Kerschbaum and A Schroepfer ldquoOptimal average-complexity ideal-security order-preserving encryptionrdquo inProceedings of the 2014 ACM SIGSAC Conference on Computerand Communications Security ACM 2014 pp 275ndash286

Wei Zhang received his BS degree in Com-puter Science from Hunan University Chinain 2011 Since 2011 He has been a PhDcandidate in College of Computer Scienceand Electronic Engineering Hunan Universi-ty His research interests include cloud com-puting network security and data mining

Yaping Lin received his BS degree in Com-puter Application from Hunan University Chi-na in 1982 and his MS degree in Com-puter Application from National University ofDefense Technology China in 1985 He re-ceived his PhD degree in Control Theoryand Application from Hunan University in2000 He has been a professor and PhD

supervisor in Hunan University since 1996From 2004-2005 he worked as a visitingresearcher at the University of Texas at Ar-

lington His research interests include machine learning networksecurity and wireless sensor networks

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 1414

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 14

Sheng Xiao received his BS degree in Ts-inghua University China in 2002 and hisMS degree from the National SingaporeUniversityMIT (Singapore-MIT Alliance) in2003 He received his PhD degree fromthe University of Massachusetts Amherst in2013 He has been an associate professor atthe College of Computer Science and Elec-tronic Engineering Hunan University He re-

ceived INFOCOM Best Paper in 2010 His re-search interests include communication se-

curity cloud computing and big data

Jie Wu is the chair and a Laura H Carnellprofessor with the Department of Computerand Information Sciences Temple Universi-ty Philadelphia PA Prior to joining TempleUniversity he was a program director withthe National Science Foundation and a dis-tinguished professor with Florida Atlantic U-niversity Boca Raton FL He regularly pub-lishes in scholarly journals conference pro-ceedings and books His research interestsinclude mobile computing and wireless net-

works routing protocols cloud and green computing network trustand security and social network applications He serves on severaleditorial boards including IEEE Transactions on Computers IEEETransactions on Service Computing and Journal of Parallel andDistributed Computing He was general cochairchair for IEEE MASS2006 IEEE IPDPS 2008 and IEEE ICDCS 2013 as well as programcochair for IEEE INFOCOM 2011 and CCF CNCC 2013 he servedas a general chair forACM MobiHoc 2014 He was an IEEE Com-puter Society distinguished visitor ACM distinguished speaker andchair for the IEEE Technical Committee on Distributed Processing(TCDP) He is a CCF distinguished speaker and a fellow of the IEEEHe is the recipient of the 2011 China Computer Federation (CCF)Overseas Outstanding Achievement Award

Siwang Zhou received his BS degree inApplied Physics from Fudan University in1995 He received his PhD degree in com-puter application technology from Hunan U-niversity in 2007 He has been an associateprofessor at the College of Computer Sci-ence and Electronic Engineering Hunan U-niversity His research interests include wire-less sensor networks and wavelet compres-

sive sensing

Page 5: Privacy Preserving Ranked Multi-Keyword

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 5140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 5

Fig 2 Format of Authentication Data

in Fig 2 the authentication data consists of five

parts The request counter field records the number of search requests that the data user has submitted Thelast request time field asks the data user to providethe historical data of his previous request time Thepersonally identifiable data (eg passport numbertelephone number) field is used to identify a specificdata user while the random number and CRC fieldare further used to check whether the authenticationdata has been tampered with

The key point of a successful authentication is toprovide both the dynamically changing secret keysand the historical data of the corresponding data

user Let kij denotes the secret key shared betweenadministration server and the jth data user U j afteri instances of search requests and dij denotes theauthentication data for the (i+1)th request of U j Ourauthentication protocol runs in the following six steps

A Data user U j prepares his authentication datadij ie U j needs to fill in all the fields of authentica-tion data based on his historical data

B Data user U j encrypts dij with the current secretkey kij and submits the encrypted authenticationdata (dij)kij to the administration server

C After submitting the authentication data thedata user U j generates another secret key ki+1j =kij oplus H (dij) and stores both kij and ki+1j

D Upon receiving U j rsquos encrypted authenticationdata the administration server decrypts it with kij

E The administration server checks the requestcounter last request time personally identifiable dataand CRC respectively If the authentication succeedsthe administration server first generates a new secretkey ki+1j = kij oplus H (dij) then he replies a confirma-tion data di+1j and encrypts it with ki+1j Otherwisethe administration server encrypts di+1j with secret

key kij F Upon receiving a reply from the administrationserver the data user U j will try to decrypt it withki+1j If the decrypted data contains the confirmationdata the authentication is successful Otherwise theauthentication is regarded as being unsuccessful Thedata user deletes the new generated secret key ki+1j

and considers whether to start another authenticationFig 3 shows an example of successful authen-

tication between the administration server and thedata user As we can see after each successful au-thentication process the secret key will be changed

dynamically according to the previous key and somehistorical data Therefore once an attacker steals a se-cret key he can hardly get any benefits On one handif the attacker knows nothing about the historical data

Administration

Server

Data

User U i

0

0i

ik

d

1

1i

ik

d

2

2i

ik

d

33

ii

k d

0ik

1 0 0i i ik k H d

2 1 1i i ik k H d

3 2 2i i ik k H d

4 3 3i i ik k H d

0ik

1 0 0i i ik k H d

2 1 1i i ik k H d

3 2 2i i ik k H d

4 3 3i i ik k H d

Fig 3 Example of data user authentication and dy-namic secret key generation

of the legal data user he cannot even construct a legalauthentication data On the other hand if the legaldata user performs another successful authentication

the previous secret key will be expired

43 Illegal Search Detection

In our scheme the authentication process is protected by the dynamic secret key and the historical infor-mation We assume that an attacker has successfullyeavesdropped the secret key k0j of U j Then he hasto construct the authentication data if the attackerhas not successfully eavesdropped the historical da-ta eg the request counter the last request timehe cannot construct the correct authentication dataTherefore this illegal action will soon be detected bythe administration server Further if the attacker hassuccessfully eavesdropped all data of U j the attackercan correctly construct the authentication data andpretend himself to be U j without being detected bythe administration server However once the legaldata user U j performs his search since the secret keyon the administration server side has changed therewill be contradictory secret keys between the admin-istration server and the legal data user Therefore thedata user and administration server will soon detectthis illegal action

44 Data User Revocation

Different from previous works data user revocationin our scheme does not need to re-encrypt and updatelarge amounts of data stored on the cloud server In-stead the adminstration server only needs to updatethe secret data S a stored on the cloud server As will be detailed in the next section S a = gka1middotka2middotra whereka1 and ka2 are the secret keys of the administrationserver and ra is randomly generated for every updateoperation Consequently the previous trapdoors will be expired Additionally without the help of the

administration server the revoked data user cannotgenerate the correct trapdoor T whprime Therefore a datauser cannot perform correct searches once he is re-voked

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 6140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 6

5 MATCHING DIFFERENT-KE Y ENCRYPTED

KEYWORDS

Numerous data owners are often involved in practicalcloud applications For privacy concerns they would be reluctant to share secret keys with others Insteadthey prefer to use their own secret keys to encrypt

their sensitive data (keywords files) When keywordsof different data owners are encrypted with differentsecret keys the coming question is how to locatedifferent-key encrypted keywords among multiple da-ta owners In this section to enable secure efficientand convenient searches over encrypted cloud dataowned by multiple data owners we systematicallydesign schemes to achieve the following three require-ments First different data owners use different secretkeys to encrypt their keywords Second authenticateddata users can generate their trapdoors without know-ing these secret keys Third upon receiving trapdoors

the cloud server can find the corresponding keyword-s from different data ownersrsquo encrypted keywordswithout knowing the actual value of keywords ortrapdoors

51 Overview

Now we present an example to illustrate the mainidea of our keywords matching protocol(the detailedprotocol is elaborated in the following subsections)Assume Alice wants to use the cloud to store her fileF she first encrypts her file F and gets the ciphertextC To enable other users to perform secure searcheson C Alice extracts a keyword wih and sends theencrypted keyword wih = (E aprime E o) to the adminis-tration server The administration server further re-encrypts E aprime to E a and submits wih = (E a E o) to thecloud server Now Bob wants to search a keyword whprime he first generates the trapdoor T primewhprime and submits it tothe administration server The administration serverre-encrypts T primewhprime to T whprime = (T 1 T 2 T 3) generates asecret data S a and submits T whprime S a to the cloudserver The cloud server will judge whether Bobrsquossearch request matches Alicersquos encrypted keyword by

checking whether e (E a

T 3

) = e (E o

T 1

) middot e (S a

T 2

)holds

52 Construction Initialization

Our construction is based on the aforementioned bi-linear map Let g and g1 denote the generator of twocyclic groups G and G1 with order p Let e be a bilinear map e G times G rarr G1 Given different secretparameters as input a randomized key generationalgorithm will output the private keys used in thesystem ka1 isin Z

+ p ka2 isin Z

+ p kif isin Z

+ p kiw isin Z

+ p larr

(0 1)lowast where ka1 and ka2 are the private keys of the

administration server kiw and kif are the privatekeys used to encrypt keywords and files of data ownerOi respectively Let H (middot) be a public hash function itsoutput locates in Z

+ p

53 Keyword Encryption

For keyword encryption the following conditionsshould be satisfied first different data owners usetheir own secret keys to encrypt keywords Secondfor the same keyword it would be encrypted to dif-ferent cipher-texts each time These properties benefit

our scheme for two reasons First losing the key of one data owner would not lead to the disclosure of other ownersrsquo data Second the cloud server cannotsee any relationship among encrypted keywords Giv-en the hth keyword of data owner Oi ie wih weencrypt wih as follows

wih =983080

gkiwmiddotromiddotH (wih) gkiwmiddotro983081

(1)

where ro is a randomly generated number eachtime which helps enhance the security of wih Foreasy description and understanding we let E primea =

gkiwmiddotromiddotH (wih)

and E o = gkiwmiddotro

The data owner delivers E aprime and E o to the admin-

istration server and the administration server furtherre-encrypts E aprime with his secret keys ka1 and ka2 andgets E a

E a =852008

E aprime middot gka1852009ka2 (2)

Therefore wih = (E a E o) The administrative serv-er further submits wih to the cloud server Note thatsince the administration server only does simple com-putations on the encrypted data he cannot learn anysensitive information from these random encrypted

data without knowing the secret keys of data owners

54 Trapdoor Generation

To make the data users generate trapdoors securelyconveniently and efficiently our proposed schemeshould satisfy two main conditions First the datauser does not need to ask a large amount of dataowners for secret keys to generate trapdoors Sec-ond for the same keyword the trapdoor generatedeach time should be different To meet this conditionthe trapdoor generation is conducted in two steps

First the data user generates trapdoors based on hissearch keyword and a random number Second theadministration server re-encrypts the trapdoors forthe authenticated data user

Assume a data user wants to search keyword whprime so he encrypts it as follows

T primewhprime =983080

gH (whprime )middotru gru983081

(3)

where ru is a randomly generated number eachtime As we can see during the trapdoor generationprocess secret keys of data owners are not required

Additionally with the help of random variable ru forthe same keyword whprime we can generate two differenttrapdoors which prevent attackers from knowing therelationship among trapdoors

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 7140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 7

Upon receiving T primewhprime the administration server firstgenerates a random number ra and then re-encryptsT primewhprime as follows

T whprime =983080

gH (whprime )middotrumiddotka1middotka2middotra grumiddotka1 grumiddotka1middotra983081

(4)

For easy description and understanding we let

T 1 = g

H (whprime )middotrumiddotka1middotka2middotra

T 2 = g

rumiddotka1

T 3 = g

rumiddotka1middotra

hence T whprime = (T 1 T 2 T 3) Finally the administrationserver submits T whprime to the cloud server

55 Keywords Matching among Different DataOwners

The cloud server stores all encrypted files and key-words of different data owners The administrationserver will also store a secret data S a = gka1middotka2middotra

on the cloud server Upon receiving a query requestthe cloud will search over the data of all these dataowners The cloud processes the search request in two

steps First the cloud matches the queried keywordsfrom all keywords stored on it and it gets a candidatefile set Second the cloud ranks files in the candidatefile set and finds the most top-k relevant files Weintroduce the matching strategy here while leavingthe task of introducing the ranking strategy in the nextsection When the cloud obtains the trapdoor T whprime andencrypted keywords (E o E a) he first computes

e (S a T 2)

= e852008

gramiddotka1middotka2 grumiddotka1852009

(5)

= e(g g)ramiddotka1middotka2middotrumiddotka1

Then he can judge whether whprime = wih (ie anencrypted keyword is located) holds if the followingequation is true

e (E a T 3)

= e

1048616983080gkiwmiddotromiddotH (wih) middot gka1

983081ka2 grumiddotka1middotra

1048617= e(g g)

(kiw middotromiddotH (wih)+ka1)middotka2middotrumiddotka1middotra (6)

= e(g g)kiwmiddotromiddotH (wih)middotka2middotrumiddotka1middotra middot e (S a T 2)

= e983080

gkiwmiddotro gH (wih)middotka2middotrumiddotka1middotra983081

middot e (S a T 2)

= e (E o T 1) middot e (S a T 2)

6 PRIVACY PRESERVING RANKED SEARCH

The aforementioned section helps the cloud matchthe queried keywords and obtain a candidate fileset However we cannot simply return undifferentialfiles to data users for the following two reasons Firstreturning all candidate files would cause abundantcommunication overhead for the whole system Sec-ond data users would only concern the top-k relevantfiles corresponding to their queries In this sectionwe first elucidate an order and privacy preserving

encoding scheme Then we illustrate an additive orderpreserving and privacy preserving encoding schemeFinally we apply the proposed scheme to encode therelevance scores and obtain the top-k search results

x 1 2 3 4 5

f ( x) 100-1000 1100-1800 2000-4200 4300-5000 5100-7000

Fig 4 An example of Order Preserving and PrivacyPreserving Function

61 Order and Privacy Preserving Function

To rank the relevance score while preserving its priva-cy the proposed function should satisfy the followingconditions 1) This function should preserve the orderof data as this helps the cloud server determine whichfile is more relevant to a certain keyword according tothe encoded relevance scores 2) This function shouldnot be revealed by the cloud server so that cloudserver can make comparisons on encoded relevancescores without knowing their actual values 3) Dif-ferent data owners should have different functionssuch that revealing the encoded value of a data ownerwould not lead to the leakage of encoded values of other data owners In order to satisfy condition 1we introduce a data processing part m(x middot) whichpreserves the order of x To satisfy condition 2 weintroduce a disturbing part rf which helps prevent thecloud server from revealing this function To satisfycondition 3 we use m(x middot) to process the ID of dataowners So this function belongs to the followingfunction family

F yoppf (x) =991761

0lejkleτ

Ajk middot m(x j) middot m(y k) + rf (7)

where τ denotes the degree of F yoppf (x) and Ajkdenotes the coefficients of m(x j) middot m(y k) We furtherdefine m(x j) as follows m(x 0) = 1 m(x 1) = xand m(x j) = (m(x jminus1)+α middot x) middot (1+λ) if jgt1 whereα and λ are two constant numbers

Now we introduce how to set the disturbing partrf Since

sum0lejkleτ Ajkmiddot(m(x+1 j)minusm(x j))middotm(y k)gesum

0lejkleτ Ajkmiddot((1+λ)jminus1

+ αmiddotsum

1leilejminus2(1+λ))

Let l be an integer such that2lminus1le

sum0lejkleτ Ajkmiddot((1+λ)

jminus1+αmiddot

sum1leilejminus2(1+λ))le2l

we can set rf isin (0 2lminus1)Obviously forallx1gtx2 we have F yoppf (x1)gtF yoppf (x2)

62 Additive Order and Privacy Preserving Func-tion

With the order and privacy preserving function wehave if x1 gt x2 then y1 gt y2 However the additionof the function is not necessarily order preserving

An example is presented in Fig 4 Obviously f (x)

is order preserving to preserve privacy a disturbingpart is introduced As we can see f (1) + f (5) variesfrom 5200 to 8000 f (1) + f (4) varies from 4400 to6000 f (2) + f (3) varies from 3100 to 6000 Though

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 8140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 8

1 + 5 gt 1 + 4 and 1 + 5 gt 2 + 3 it is probable forf (1)+f (5) lt f (1)+f (4) and f (1)+f (5) lt f (2)+f (3)

To correctly perform the secure ranked multi-keyword search the sum of any two encoded rel-evance scores should still be ordered and privacypreserved (we use the sum of encoded relevance scoreto evaluate the relevance between a file and multiplekeywords in this paper) To satisfy this conditionwe further design an additive order and privacypreserving function family based on Eq 7

F yaoppf (x) =991761

0lejkleτ

Ajk middot m(x j) middot m(y k) + raof (8)

Where τ denotes the degree of F yaoppf (x) and Ajkdenotes the coefficients of m(x j) middot m(y k)

Before we take a step further towards the useof F yaoppf (x) we will first introduce our auxiliarytheorem ie how to make an order and privacypreserving function to be additive order and privacypreserving

Definition 3 Given an order preserving functiony = f (x) we define ∆f (xi) = f (xi+1) minus f (xi) and∆f (xi) = ∆f (xi+1) minus ∆f (xi) To preserve privacy wedefine yi = f (xi) + ri where ri is a random numberand ri gt 0 We define the max value of ri as rimaxand rimax = f (xi)minusimiddot∆(xi) hence yi can change fromf (xi) to f (xi) + rimax

Theorem 1 Given an order preserving function yi =f (xi) + ri when the following three conditionsare satisfied (1) foralli ∆f (xi + 1) le ∆f (xi) (2) foralli

˜∆f (xi + 1) ge ˜

∆f (xi) (3) foralli ri lt 983080i

2

middot ˜∆f (xi)983081 2The order and privacy preserving function yi is also

an additive order and privacy preserving that isfor any

sumxiisinD1

xi le sum

xjisinD2xj where D1 and D2

denotes two subsets of the definition domain we havesumxiisinD1

yi lesum

xjisinD2yj Proof The proof is elaborated in Appendix A

63 Encoding relevance scores

In our paper we use a well-known method to com-pute the relevance score [26] ie Score(w F d) =1

|F d|(1+ln f dw)ln(1+ N

f w) where w denotes the given

keyword |F d| is the length of file F d f dw denotes theT F of w in file F d f w denotes the number of filescontaining w and N denotes the total number of filesin the collection

Now we introduce how to encode the relevancescore with an additive order and privacy preservingfunction in our additive order and privacy preservingfunction family F yaoppf (x) Given input S ijt (the rel-evance score of tth keyword to jth file of ith dataowner) Data owners can independently choose afunction from this family to protect the privacy of their relevance scores For simplicity in this paper we

specify data owner i to choose F H i(i)aoppf (middot) and defineV ijt as the encoded data of S ijt

V ijt = F H i(i)aoppf (S ijt) (9)

where H i(middot) is a public known hash function and i isthe I D of data owner Oi

With the well-designed properties of F aoppf thecloud server can make a comparison among encodedrelevance scores for the same data owner Howeversince different data owners encode their relevancescores with different functions in F aoppf the cloudserver cannot make a comparison between encodedrelevance scores for different data owners To solvethis problem we define

T ijt(y) = F yaoppf (S ijt) (10)

where T ijt(y) is used to help the cloud servermake comparisons among relevance scores encoded by different data owners and y is a variable whichtakes the hash value of data ownerrsquos ID as input

Finally we attach each V ijt with a T ijt(y)

64 Ranking search results

In this paper we use the sum of the relevance scoresas the metric to rank search results Now we intro-duce the strategies of ranking search results based onthe encoded relevance scores First the cloud com-putes V ij =

sumtisinW

V ijt the sum of encoded relevance

scores between the jth file and matched keywords forOi and the auxiliary value T ij(y) =

sumtisinW

T ijt(y) Then

the cloud ranks the sum of encoded relevance scorewith the following two conditions

(1) Two encoded data belong to the same dataowner Given that a data user issues a query W =wm wn we assume that Oirsquos F 1 and F 2 satisfy thequery Then the cloud adds the encoded relevancescore together and gets the relevance score of Oiprimes F 1to W

V i1 = V i1m + V i1n (11)

Similarly

V i2 = V i2m + V i2n (12)

If V i1 ge V i2 then F 1 is more relevant to the queried

keyword sets W otherwise F 2 is more relevant to W (2) Two encoded data belong to two different data

owners Given that a data user issues a query W =wm wn we assume Oirsquos F 1 and Oj rsquos F 2 satisfy thequery The cloud server makes comparison on theirencoded relevance scores in the following four steps

First the cloud server computes the relevance score

of Oiprimes F 1 to W

V i1 = V i1m + V i1n (13)

Second the cloud server computes the T j2(y)

T j2(y) = T j2m(y) + T j2n(y) (14)

Third the cloud server substitutes H i(i) for thevariable y and gets T j2(H (i))

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 9140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 9

C 11

C 12

C 21

C 23

O1 O2

H (1)

Fig 5 Example of ranking search results

Finally the cloud draws the conclusion if V i1 ge

T j2(H i(i)) then Oiprimes F 1 is more relevant to W thanOj primes F 2 otherwise Oj primes F 2 is more relevant to W

Now the cloud server can make comparisons onencoded relevance scores Thus it is easy for the cloudto return the top-k relevant files to the data user

Fig 5 shows an example of making comparisons onthe encoded relevance scores As we can see giventhe trapdoor T w1 the cloud server finds that w11 andw21 match T w1 Data owner O1 has two encryptedfiles C 11 and C 12 that contain w11 Data ownerO2 also has two encrypted files C 21 and C 23 thatcontain w21 For O1 he compares V 111 with V 121 if V 111 gt V 121 the cloud server regards C 11 as beingmore relevant to T w1 otherwise C 21 is more relevantto T w1 We assume C 12 is more relevant to T w1 for O1

and C 21 is more relevant for O2 then the cloud makescomparisons between C 12 and C 21 As shown in thefigure the cloud first substitutes H (1) to T 211(y) andgets T 211(H (1)) then he compares T 211(H (1)) withV 121 if V 121 gt T 211(H (1)) Then O1rsquos encrypted fileC 12 is more relevant to T w1 otherwise O2rsquos encryptedfile C 21 is more relevant to the search request

7 SECURITY ANALYSIS

In this section we provide step-by-step security anal-yses to demonstrate that the security requirementshave been satisfied for the data files the keywordsthe queries and the relevance scores

71 Data Files

The data files are protected by symmetric encryption before upload As long as the encryption algorithm isnot breakable the cloud server cannot know the data

72 Keywords

We formulate the security goals achieved by PRMSMwith the following two theorems

Theorem 2 Given the DDH (Decisional DiffieHell-man) assumption PRMSM is semantically secure a-gainst the chosen keyword attack under the selectivesecurity modelProof See Appendix B

Theorem 3 Given the DL (Discrete Logarithm) as-sumption PRMSM achieves keyword secrecy in therandom oracle modelProof See Appendix C

73 Trapdoors

Recall the trapdoor construction formula

T whprime =983080

gH (whprime )middotrumiddotka1middotka2middotra grumiddotka1 grumiddotka1middotra983081

(15)

If the cloud server wants to know the actual valueof the trapdoor and distinguish two trapdoors it has

to solve the discrete logarithm problem in Z p withlarge prime p therefore the privacy of trapdoor isprotected as long as the discrete logarithm problem ishard

74 Relevance Scores

In our scheme relevance scores are encoded with twoAdditive Order and Privacy Preserving FunctionsNow we analyze the security of additive order andprivacy preserving functions Assume the input for

the F yaopp

(x) is

s and the data owner ID is

i Thenthe cloud server can only capture the following value

that is derived from s

F (H i(i))aopp (s) =991761

0lejkleτ

Ajk middotm(s j)middotm(H i(i) k)+raof

(16)Assume the cloud server has collected n encod-

ed relevance scores for the same relevance score s(F

(H i(i))aopp (s)) then he can construct n equations How-

ever these functions have n + 1 unknown variablesTherefore it is infeasible for the cloud server to breakthe additive order and privacy preserving F yaopp(x)

Thus the security of the corresponding F yaopp(x) en-coded relevance score is also preserved

8 PERFORMANCE EVALUATION

In this section we measure the efficiency of PRMSMand compare it with its previous version SecureRanked Multi-keyword Search for Multiple data own-ers in cloud computing (SRMSM) [27] and the state-of-the-art privacy-preserving Multi-keyword RankedSearch over Encrypted cloud data (MRSE) [11] side

by side Since MRSE is only suitable for the singleowner model our PRMSM and SRMSM not onlywork well in multi-owner settings but also outper-form MRSE on many aspects

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 10140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 10

Number of files in the dataset (times103)

T i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(a) For different size datasetswith the same dictionary u =4000

Number of keywords in dictionary(times103) T

i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(b) For the same dataset withdifferent dictionary sizes n =1000

T i m e o f E n c o d i n g y ( s )

Number of keywords in dictionary (times103)

(c) Time cost of encoding rel-evance score (a subroutine of

building index)

Fig 6 Time cost of index construction

Number of keywords in dictionary(times103) T i m e o f g e n e r a t i n g t r a p d o o r ( s )

(a) For different size of key-word dictionary with the samenumber of queried keywordsq = 100

Number of queried keywords (times102) T i m e o f g e

n e r a t i n g t r a p d o o r ( times 1 0 - 2 s )

(b) For different number of queried keywords with thesame size of keyword dictio-nary u = 4000

Fig 7 Time cost of generating trapdoors

81 Evaluation Settings

We conduct performance experiments on a real data

set the Internet Request For Comments dataset (RFC)[28] We use Hermetic Word Frequency Counter [29]to extract keywords from each RFC file After thekeyword extraction we compute keyword statisticssuch as the keyword frequency in each file the lengthof each file the number of files containing a specifickeyword etc We further calculate the relevance scoreof a keyword to a file based on these statistics Thefile size and keyword frequency of this data set can be seen in [20]

The experiment programs are coded using thePython programming language on a PC with 22GHZ

Intel Core CPU and 2GB memory We implementall necessary routines for data owners to preprocessdata files for the data user to generate trapdoorsfor the administrative server to re-encrypt keywordstrapdoors and for the cloud server to perform rankedsearches We use the Weil pairing [30] to construct our bilinear map

82 Evaluation Results

821 Index Construction

Fig 6(a) shows that given the same keyword dictio-

nary (u=4000) time of index construction for theseschemes increases linearly with an increasing numberof files while SRMSM and PRMSM spend much lesstime on index construction Fig 6(b) demonstrates

that given the same number of files (n=1000) SRMSMand PRMSM consume much less time than MRSE

on constructing indexes Additionally SRMSM andPRMSM are insensitive to the size of the keyworddictionary for index construction while MRSE suf-fers a quadratic growth with the size of keyworddictionary increases Fig 6(c) shows the encodingefficiency of our proposed AOPPF The time spent onencoding increases from 01s to 1s when the numberof keywords increases from 1000 to 10000 This timecost can be acceptable

822 Trapdoor Generation

Compared with index construction trapdoor genera-

tion consumes relatively less time Fig 7(a) demon-strates that given the same number of queried key-words (q=100) SRMSM and PRMSM are insensitive tothe size of keyword dictionary on trapdoor generationand consumes 0026s and 0031s respectively Mean-while MRSE increases from 004s to 62s Fig 7(b)shows that given the same number of dictionary size(u=4000) when the number of queried keywords in-creases from 100 to 1000 the trapdoor generation timefor MRSE is 031s and remains unchanged WhileSRMSM increases from 0024s to 025s PRMSM in-creases from 0031s to 031s We observe that PRMSM

spends a little more time than SRMSM on trapdoorgeneration the reason is that PRMSM introduces anadditional variable to ensure the randomness of trap-doors

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 11140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 11

T

i m e c o s t o f r e - e n c r y p t i o n ( s )

Average number of keywords per owner

(a) Time cost of re-encryptingkeywords

Average number of trapdoors per user

T i m e c o s t o f r e - e n c r y p t i o n ( s )

(b) Time cost of re-encryptingtrapdoors

Fig 8 Time cost of the administration server

Number of queried keywords

T i m e o f q u e r y ( s )

(a) For different size of queriedkeywords with the same sizeof data set n=2000 and key-word dictionary u = 100

Number of files in the dataset (times103)

T i m e o f q u e r y ( s )

(b) For different size of data setwith the same size of queriedkeywords q=5 and keyworddictionary u = 100

Number of keywords in dictionary (times102

T i m e o f q u e r y ( s )

(c) For different size of key-word dictionary with the samesize of queried keywords q=5and data set n = 2000

Fig 9 Time cost of search

823 Re-encryption by the administration server

Fig 8(a) illustrates the re-encryption time cost of the

administration server in PRMSM As we can see forthe same average number of keywords per owner themore data owners are involved the more time is spenton re-encryption When there are 300 data ownerseach data owner has 100 keywords we need 38s tore-encrypt these keywords which is acceptable

Fig 8(b) demonstrates the time cost of re-encryptingtrapdoors We observe that for the same averagenumber of trapdoors per user the more data usersthat submit trapdoors the more time would be spenton re-encryption When there are 1000 data userswho concurrently submit data each data user has 10

trapdoors we only need 334s for re-encryption

824 Search

From Fig 9 we observe that PRMSM spends moretime for searching The fundamental reason is thatthe pairing operation used in PRMSM needs moretime As we can see from Fig 9(a) and Fig 9(c)the more keywords existing in the cloud server themore time is required for pairing operation Fig 9(b)confirms that when the number of keywords stored onthe cloud server remains a constant PRMSM will not

increase even if the number of files increases ThoughPRMSM spends relatively more time this observationalso confirms that the searching operation should beoutsourced to the cloud server

9 RELATED WOR K

In this section we review three categories of worksearchable encryption secure keyword search in cloudcomputing and order preserving encryption

91 Searchable Encryption

The earliest attempt of searchable encryption wasmade by Song et al In [3] they propose to encrypteach word in a file independently and allow the serverto find whether a single queried keyword is containedin the file without knowing the exact word Thisproposal is more of theoretic interests because of highcomputational costs Goh et al propose building akeyword index for each file and using Bloom filter

to accelerate the search [4] Curtmola et al propose building indices for each keyword and use hashtables as an alternative approach to searchable en-cryption [5] The first public key scheme for keywordsearch over encrypted data is presented in [6] [7] and[8] further enrich the search functionalities of search-able encryption by proposing schemes for conjunctivekeyword search

The searchable encryption cares mostly about singlekeyword search or boolean keyword search Extend-ing these techniques for ranked multi-keyword searchwill incur heavy computation and storage costs

92 Secure Keyword Search in Cloud Computing

The privacy concerns in cloud computing motivatethe study on secure keyword search Wang et al

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 12140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 12

first defined and solved the secure ranked keywordsearch over encrypted cloud data In [9] and [18] theyproposed a scheme that returns the top-k relevant filesupon a single keyword search Cao et al [10] [11]and Sun et al [31] [12] extended the secure keywordsearch for multi-keyword queries Their approachesvectorize the list of keywords and apply matrix mul-tiplications to hide the actual keyword informationfrom the cloud server while still allowing the serverto find out the top-k relevant data files Xu et alproposed MKQE (Multi-Keyword ranked Query onEncrypted data) that enables a dynamic keyword dic-tionary and avoids the ranking order being distorted by several high frequency keywords [13] Li et al[14] Chuah et al [15] Xu et al [16] and Wang et al[17] proposed fuzzy keyword search over encryptedcloud data aiming at tolerance of both minor typosand format inconsistencies for usersrsquo search input [19]further proposed privacy-assured similarity searchmechanisms over outsourced cloud data In [20] weproposed a secure efficient and distributed keywordsearch protocol in the geo-distributed cloud environ-ment

The system model of these previous works onlyconsider one data owner which implies that in theirsolutions the data owner and data users can easilycommunicate and exchange secret information Whennumerous data owners are involved in the systemsecret information exchanging will cause considerablecommunication overhead Sun et al [21] and Zheng

et al [22] proposed secure attribute-based keywordsearch schemes in the challenging scenario wheremultiple owners are involved However applying CP-ABE in the cloud system would introduce problemsfor data user revocation ie the cloud has to updatethe large amount of data stored on it for a data userrevocation [32] Additionally they do not supportprivacy preserving ranked multi-keyword search Ourpaper differs from previous studies regarding the em-phasis of multiple data owners in the system modelThis paper seeks a solution scheme to maximally relaxthe requirements for data owners and users so that

the scheme could be suitable for a large number of cloud computing users

93 Order Preserving Encryption

The order preserving encryption is used to preventthe cloud server from knowing the exact relevancescores of keywords to a data file The early work of Agrawal et al proposed an Order Preserving sym-metric Encryption (OPE) scheme where the numericalorder of plain texts are preserved [33] Boldyreva etal further introduced a modular order preserving

encryption in [34] Yi et al [35] proposed an order p-reserving function to encode data in sensor networksPopa et al [36] recently proposed an ideal-secureorder-preserving encryption scheme Kerschbaum et

al [37] further proposed a scheme which is not onlyidea-secure but is also an efficient order-preservingencryption scheme However these schemes are notadditive order preserving As a complementary workto the previous order preserving work we propose anew additive order and privacy preserving functions(AOPPF) Data owners can freely choose any functionfrom an AOPPF family to encode their relevancescores The cloud server computes the sum of encodedrelevance scores and ranks them based on the sum

10 CONCLUSIONS

In this paper we explore the problem of securemulti-keyword search for multiple data owners andmultiple data users in the cloud computing envi-ronment Different from prior works our schemesenable authenticated data users to achieve secureconvenient and efficient searches over multiple dataownersrsquo data To efficiently authenticate data usersand detect attackers who steal the secret key andperform illegal searches we propose a novel dynamicsecret key generation protocol and a new data userauthentication protocol To enable the cloud server toperform secure search among multiple ownersrsquo dataencrypted with different secret keys we systematical-ly construct a novel secure search protocol To rank thesearch results and preserve the privacy of relevancescores between keywords and files we propose anovel Additive Order and Privacy Preserving Func-tion family Moreover we show that our approachis computationally efficient even for large data andkeyword sets As our future work on one hand wewill consider the problem of secure fuzzy keywordsearch in a multi-owner paradigm On the other handwe plan to implement our scheme on the commercialclouds

ACKNOWLEDGMENTS

This work is supported in part by the National Natu-ral Science Foundation of China (Project No 6117303861472125 61300217) and the China Scholarship Coun-cil A preliminary version of this paper appearedin the Proceedings of the IEEEIFIP InternationalConference on Dependable Systems and Networks(DSNrsquo14) [27]

REFERENCES

[1] M Armbrust A Fox R Griffith A D Joseph R KatzA Konwinski G Lee D Patterson A Rabkin I Stoica andM Zaharia ldquoA view of cloud computingrdquo Communication of the ACM vol 53 no 4 pp 50ndash58 2010

[2] C Wang S S Chow Q Wang K Ren and W Lou ldquoPrivacy-preserving public auditing for secure cloud storagerdquo Comput-

ers IEEE Transactions on vol 62 no 2 pp 362ndash375 2013[3] DSong DWagner and APerrig ldquoPractical techniques for

searches on encrypted datardquo in Proc IEEE International Sym- posium on Security and Privacy (SampPrsquo00) Nagoya Japan Jan2000 pp 44ndash55

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 13140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 13

[4] E Goh (2003) Secure indexes [Online] Availablehttpeprintiacrorg

[5] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proc ACM CCSrsquo06 VA USA Oct 2006 pp79ndash88

[6] D B et al ldquoPublic key encryption with keyword search secureagainst keyword guessing attacks without random oraclerdquoEUROCRYPT vol 43 pp 506ndash522 2004

[7] P Golle J Staddon and B Waters ldquoSecure conjunctive key-word search over encrypted datardquo in Proc Applied Cryptogra-

phy and Network Security (ACNSrsquo04) Yellow Mountain China Jun 2004 pp 31ndash45

[8] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo in ProcInformation and Communications Security (ICICSrsquo05) BeijingChina Dec 2005 pp 414ndash426

[9] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo in Proc IEEEDistributed Computing Systems (ICDCSrsquo10) Genoa Italy Jun2010 pp 253ndash262

[10] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo in Proc IEEE INFOCOMrsquo11 Shanghai China Apr 2011pp 829ndash837

[11] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo Parallel and Distributed Systems IEEE Transactions onvol 25 no 1 pp 222ndash233 2014

[12] W Sun B Wang N Cao M Li W Lou Y T Hou and H LildquoVerifiable privacy-preserving multi-keyword text search inthe cloud supporting similarity-based rankingrdquo Parallel andDistributed Systems IEEE Transactions on vol 25 no 11 pp3025ndash3035 2014

[13] Z Xu W Kang R Li K Yow and C Xu ldquoEfficient multi-keyword ranked query on encrypted data in the cloudrdquo inProc IEEE Parallel and Distributed Systems (ICPADSrsquo12) Singa-pore Dec 2012 pp 244ndash251

[14] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo in

Proc IEEE INFOCOMrsquo10 San Diego CA Mar 2010 pp 1ndash5[15] M Chuah and W Hu ldquoPrivacy-aware bedtree based solutionfor fuzzy multi-keyword search over encrypted datardquo in ProcIEEE 31th International Conference on Distributed ComputingSystems (ICDCSrsquo11) Minneapolis MN Jun 2011 pp 383ndash392

[16] P Xu H Jin Q Wu and W Wang ldquoPublic-key encryptionwith fuzzy keyword search A provably secure scheme underkeyword guessing attackrdquo Computers IEEE Transactions onvol 62 no 11 pp 2266ndash2277 2013

[17] B Wang S Yu W Lou and Y T Hou ldquoPrivacy-preservingmulti-keyword fuzzy search over encrypted data in thecloudrdquo in IEEE INFOCOM Toronto Canada May 2014 pp2112ndash2120

[18] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoParallel and Distributed Systems IEEE Transactions on vol 23

no 8 pp 1467ndash1479 2012[19] C Wang K Ren S Yu and K M R Urs ldquoAchieving usableand privacy-assured similarity search over outsourced clouddatardquo in Proc IEEE INFOCOMrsquo12 Orlando FL Mar 2012 pp451ndash459

[20] W Zhang Y Lin S Xiao Q Liu and T Zhou ldquoSecuredistributed keyword search in multiple cloudsrdquo in ProcIEEEACM IWQOSrsquo14 Hongkong May 2014 pp 370ndash379

[21] W Sun S Yu W Lou Y T Hou and H Li ldquoProtectingyour right Attribute-based keyword search with fine-grainedowner-enforced search authorization in the cloudrdquo in ProcIEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 226ndash234

[22] Q Zheng S Xu and G Ateniese ldquoVabks Verifiable attribute- based keyword search over outsourced encrypted datardquo inProc IEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 522ndash530

[23] T Jung X Y Li Z Wan and M Wan ldquoPrivacy preservingcloud data access with multi-authoritiesrdquo in Proc IEEE INFO-COMrsquo13 Turin Italy Apr 2013 pp 2625ndash2633

[24] Q Liu C C Tan J Wu and G Wang ldquoEfficient informationretrieval for ranked queries in cost-effective cloud environ-

mentsrdquo in INFOCOM 2012 Proceedings IEEE IEEE 2012 pp2581ndash2585

[25] B Chor E Kushilevitz O Goldreich and M Sudan ldquoPrivateinformation retrievalrdquo Journal of the ACM vol 45 no 6 pp965ndash981 1998

[26] I H Witten A Moffat and T C Bell Managing gigabytesCompressing and indexing documents and images San FranciscoUSA Morgan Kaufmann 1999

[27] W Zhang S Xiao Y Lin T Zhou and S Zhou ldquoSecure ranked

multi-keyword search for multiple data owners in cloud com-putingrdquo in Proc 44th Annual IEEEIFIP International Conferenceon Dependable Systems and Networks (DSN2014) Atlanta USAIEEE jun 2014 pp 276ndash286

[28] IETF ldquoRequest for comments databaserdquo [Online] Availablehttpwwwietforgrfchtml

[29] H Systems ldquoHermetic word frequency counterrdquo [Online]Available httpwwwhermeticchwfcwfchtm

[30] D Boneh and M Franklin ldquoIdentity-based encryption fromthe weil pairingrdquo SIAM Journal on Computing vol 32 no 3pp 586ndash615 2003

[31] W Sun B Wang N Cao M Li W Lou Y T Hou andH Li ldquoPrivacy-preserving multi-keyword text search in thecloud supporting similarity-based rankingrdquo in Proc IEEE

ASIACCSrsquo13 Hangzhou China May 2013 pp 71ndash81[32] J Hur ldquoImproving security and efficiency in attribute-based

data sharingrdquo IEEE TRANSACTIONS ON KNOWLEDGE ANDDATA ENGINEERING vol 25 no 10 pp 2271ndash2282 2013

[33] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preserv-ing encryption for numeric datardquo in Proc ACM SIGMODrsquo04Paris France Jun 2004 pp 563ndash574

[34] A Boldyreva N Chenette Y Lee and A O ldquoOrder-preserving encryption revisited Improved security analysisand alternative solutionsrdquo in Proc Advances in Cryptology(CRYPTOrsquo11) California USA Aug 2011 pp 578ndash595

[35] Y Yi R Li F Chen A X Liu and Y Lin ldquoA digitalwatermarking approach to secure and precise range queryprocessing in sensor networksrdquo in Proc IEEE INFOCOMrsquo13Turin Italy Apr 2013 pp 1950ndash1958

[36] R A Popa F H Li and N Zeldovich ldquoAn ideal-security pro-tocol for order-preserving encodingrdquo in Security and Privacy

(SP) 2013 IEEE Symposium on IEEE 2013 pp 463ndash477[37] F Kerschbaum and A Schroepfer ldquoOptimal average-complexity ideal-security order-preserving encryptionrdquo inProceedings of the 2014 ACM SIGSAC Conference on Computerand Communications Security ACM 2014 pp 275ndash286

Wei Zhang received his BS degree in Com-puter Science from Hunan University Chinain 2011 Since 2011 He has been a PhDcandidate in College of Computer Scienceand Electronic Engineering Hunan Universi-ty His research interests include cloud com-puting network security and data mining

Yaping Lin received his BS degree in Com-puter Application from Hunan University Chi-na in 1982 and his MS degree in Com-puter Application from National University ofDefense Technology China in 1985 He re-ceived his PhD degree in Control Theoryand Application from Hunan University in2000 He has been a professor and PhD

supervisor in Hunan University since 1996From 2004-2005 he worked as a visitingresearcher at the University of Texas at Ar-

lington His research interests include machine learning networksecurity and wireless sensor networks

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 1414

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 14

Sheng Xiao received his BS degree in Ts-inghua University China in 2002 and hisMS degree from the National SingaporeUniversityMIT (Singapore-MIT Alliance) in2003 He received his PhD degree fromthe University of Massachusetts Amherst in2013 He has been an associate professor atthe College of Computer Science and Elec-tronic Engineering Hunan University He re-

ceived INFOCOM Best Paper in 2010 His re-search interests include communication se-

curity cloud computing and big data

Jie Wu is the chair and a Laura H Carnellprofessor with the Department of Computerand Information Sciences Temple Universi-ty Philadelphia PA Prior to joining TempleUniversity he was a program director withthe National Science Foundation and a dis-tinguished professor with Florida Atlantic U-niversity Boca Raton FL He regularly pub-lishes in scholarly journals conference pro-ceedings and books His research interestsinclude mobile computing and wireless net-

works routing protocols cloud and green computing network trustand security and social network applications He serves on severaleditorial boards including IEEE Transactions on Computers IEEETransactions on Service Computing and Journal of Parallel andDistributed Computing He was general cochairchair for IEEE MASS2006 IEEE IPDPS 2008 and IEEE ICDCS 2013 as well as programcochair for IEEE INFOCOM 2011 and CCF CNCC 2013 he servedas a general chair forACM MobiHoc 2014 He was an IEEE Com-puter Society distinguished visitor ACM distinguished speaker andchair for the IEEE Technical Committee on Distributed Processing(TCDP) He is a CCF distinguished speaker and a fellow of the IEEEHe is the recipient of the 2011 China Computer Federation (CCF)Overseas Outstanding Achievement Award

Siwang Zhou received his BS degree inApplied Physics from Fudan University in1995 He received his PhD degree in com-puter application technology from Hunan U-niversity in 2007 He has been an associateprofessor at the College of Computer Sci-ence and Electronic Engineering Hunan U-niversity His research interests include wire-less sensor networks and wavelet compres-

sive sensing

Page 6: Privacy Preserving Ranked Multi-Keyword

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 6140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 6

5 MATCHING DIFFERENT-KE Y ENCRYPTED

KEYWORDS

Numerous data owners are often involved in practicalcloud applications For privacy concerns they would be reluctant to share secret keys with others Insteadthey prefer to use their own secret keys to encrypt

their sensitive data (keywords files) When keywordsof different data owners are encrypted with differentsecret keys the coming question is how to locatedifferent-key encrypted keywords among multiple da-ta owners In this section to enable secure efficientand convenient searches over encrypted cloud dataowned by multiple data owners we systematicallydesign schemes to achieve the following three require-ments First different data owners use different secretkeys to encrypt their keywords Second authenticateddata users can generate their trapdoors without know-ing these secret keys Third upon receiving trapdoors

the cloud server can find the corresponding keyword-s from different data ownersrsquo encrypted keywordswithout knowing the actual value of keywords ortrapdoors

51 Overview

Now we present an example to illustrate the mainidea of our keywords matching protocol(the detailedprotocol is elaborated in the following subsections)Assume Alice wants to use the cloud to store her fileF she first encrypts her file F and gets the ciphertextC To enable other users to perform secure searcheson C Alice extracts a keyword wih and sends theencrypted keyword wih = (E aprime E o) to the adminis-tration server The administration server further re-encrypts E aprime to E a and submits wih = (E a E o) to thecloud server Now Bob wants to search a keyword whprime he first generates the trapdoor T primewhprime and submits it tothe administration server The administration serverre-encrypts T primewhprime to T whprime = (T 1 T 2 T 3) generates asecret data S a and submits T whprime S a to the cloudserver The cloud server will judge whether Bobrsquossearch request matches Alicersquos encrypted keyword by

checking whether e (E a

T 3

) = e (E o

T 1

) middot e (S a

T 2

)holds

52 Construction Initialization

Our construction is based on the aforementioned bi-linear map Let g and g1 denote the generator of twocyclic groups G and G1 with order p Let e be a bilinear map e G times G rarr G1 Given different secretparameters as input a randomized key generationalgorithm will output the private keys used in thesystem ka1 isin Z

+ p ka2 isin Z

+ p kif isin Z

+ p kiw isin Z

+ p larr

(0 1)lowast where ka1 and ka2 are the private keys of the

administration server kiw and kif are the privatekeys used to encrypt keywords and files of data ownerOi respectively Let H (middot) be a public hash function itsoutput locates in Z

+ p

53 Keyword Encryption

For keyword encryption the following conditionsshould be satisfied first different data owners usetheir own secret keys to encrypt keywords Secondfor the same keyword it would be encrypted to dif-ferent cipher-texts each time These properties benefit

our scheme for two reasons First losing the key of one data owner would not lead to the disclosure of other ownersrsquo data Second the cloud server cannotsee any relationship among encrypted keywords Giv-en the hth keyword of data owner Oi ie wih weencrypt wih as follows

wih =983080

gkiwmiddotromiddotH (wih) gkiwmiddotro983081

(1)

where ro is a randomly generated number eachtime which helps enhance the security of wih Foreasy description and understanding we let E primea =

gkiwmiddotromiddotH (wih)

and E o = gkiwmiddotro

The data owner delivers E aprime and E o to the admin-

istration server and the administration server furtherre-encrypts E aprime with his secret keys ka1 and ka2 andgets E a

E a =852008

E aprime middot gka1852009ka2 (2)

Therefore wih = (E a E o) The administrative serv-er further submits wih to the cloud server Note thatsince the administration server only does simple com-putations on the encrypted data he cannot learn anysensitive information from these random encrypted

data without knowing the secret keys of data owners

54 Trapdoor Generation

To make the data users generate trapdoors securelyconveniently and efficiently our proposed schemeshould satisfy two main conditions First the datauser does not need to ask a large amount of dataowners for secret keys to generate trapdoors Sec-ond for the same keyword the trapdoor generatedeach time should be different To meet this conditionthe trapdoor generation is conducted in two steps

First the data user generates trapdoors based on hissearch keyword and a random number Second theadministration server re-encrypts the trapdoors forthe authenticated data user

Assume a data user wants to search keyword whprime so he encrypts it as follows

T primewhprime =983080

gH (whprime )middotru gru983081

(3)

where ru is a randomly generated number eachtime As we can see during the trapdoor generationprocess secret keys of data owners are not required

Additionally with the help of random variable ru forthe same keyword whprime we can generate two differenttrapdoors which prevent attackers from knowing therelationship among trapdoors

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 7140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 7

Upon receiving T primewhprime the administration server firstgenerates a random number ra and then re-encryptsT primewhprime as follows

T whprime =983080

gH (whprime )middotrumiddotka1middotka2middotra grumiddotka1 grumiddotka1middotra983081

(4)

For easy description and understanding we let

T 1 = g

H (whprime )middotrumiddotka1middotka2middotra

T 2 = g

rumiddotka1

T 3 = g

rumiddotka1middotra

hence T whprime = (T 1 T 2 T 3) Finally the administrationserver submits T whprime to the cloud server

55 Keywords Matching among Different DataOwners

The cloud server stores all encrypted files and key-words of different data owners The administrationserver will also store a secret data S a = gka1middotka2middotra

on the cloud server Upon receiving a query requestthe cloud will search over the data of all these dataowners The cloud processes the search request in two

steps First the cloud matches the queried keywordsfrom all keywords stored on it and it gets a candidatefile set Second the cloud ranks files in the candidatefile set and finds the most top-k relevant files Weintroduce the matching strategy here while leavingthe task of introducing the ranking strategy in the nextsection When the cloud obtains the trapdoor T whprime andencrypted keywords (E o E a) he first computes

e (S a T 2)

= e852008

gramiddotka1middotka2 grumiddotka1852009

(5)

= e(g g)ramiddotka1middotka2middotrumiddotka1

Then he can judge whether whprime = wih (ie anencrypted keyword is located) holds if the followingequation is true

e (E a T 3)

= e

1048616983080gkiwmiddotromiddotH (wih) middot gka1

983081ka2 grumiddotka1middotra

1048617= e(g g)

(kiw middotromiddotH (wih)+ka1)middotka2middotrumiddotka1middotra (6)

= e(g g)kiwmiddotromiddotH (wih)middotka2middotrumiddotka1middotra middot e (S a T 2)

= e983080

gkiwmiddotro gH (wih)middotka2middotrumiddotka1middotra983081

middot e (S a T 2)

= e (E o T 1) middot e (S a T 2)

6 PRIVACY PRESERVING RANKED SEARCH

The aforementioned section helps the cloud matchthe queried keywords and obtain a candidate fileset However we cannot simply return undifferentialfiles to data users for the following two reasons Firstreturning all candidate files would cause abundantcommunication overhead for the whole system Sec-ond data users would only concern the top-k relevantfiles corresponding to their queries In this sectionwe first elucidate an order and privacy preserving

encoding scheme Then we illustrate an additive orderpreserving and privacy preserving encoding schemeFinally we apply the proposed scheme to encode therelevance scores and obtain the top-k search results

x 1 2 3 4 5

f ( x) 100-1000 1100-1800 2000-4200 4300-5000 5100-7000

Fig 4 An example of Order Preserving and PrivacyPreserving Function

61 Order and Privacy Preserving Function

To rank the relevance score while preserving its priva-cy the proposed function should satisfy the followingconditions 1) This function should preserve the orderof data as this helps the cloud server determine whichfile is more relevant to a certain keyword according tothe encoded relevance scores 2) This function shouldnot be revealed by the cloud server so that cloudserver can make comparisons on encoded relevancescores without knowing their actual values 3) Dif-ferent data owners should have different functionssuch that revealing the encoded value of a data ownerwould not lead to the leakage of encoded values of other data owners In order to satisfy condition 1we introduce a data processing part m(x middot) whichpreserves the order of x To satisfy condition 2 weintroduce a disturbing part rf which helps prevent thecloud server from revealing this function To satisfycondition 3 we use m(x middot) to process the ID of dataowners So this function belongs to the followingfunction family

F yoppf (x) =991761

0lejkleτ

Ajk middot m(x j) middot m(y k) + rf (7)

where τ denotes the degree of F yoppf (x) and Ajkdenotes the coefficients of m(x j) middot m(y k) We furtherdefine m(x j) as follows m(x 0) = 1 m(x 1) = xand m(x j) = (m(x jminus1)+α middot x) middot (1+λ) if jgt1 whereα and λ are two constant numbers

Now we introduce how to set the disturbing partrf Since

sum0lejkleτ Ajkmiddot(m(x+1 j)minusm(x j))middotm(y k)gesum

0lejkleτ Ajkmiddot((1+λ)jminus1

+ αmiddotsum

1leilejminus2(1+λ))

Let l be an integer such that2lminus1le

sum0lejkleτ Ajkmiddot((1+λ)

jminus1+αmiddot

sum1leilejminus2(1+λ))le2l

we can set rf isin (0 2lminus1)Obviously forallx1gtx2 we have F yoppf (x1)gtF yoppf (x2)

62 Additive Order and Privacy Preserving Func-tion

With the order and privacy preserving function wehave if x1 gt x2 then y1 gt y2 However the additionof the function is not necessarily order preserving

An example is presented in Fig 4 Obviously f (x)

is order preserving to preserve privacy a disturbingpart is introduced As we can see f (1) + f (5) variesfrom 5200 to 8000 f (1) + f (4) varies from 4400 to6000 f (2) + f (3) varies from 3100 to 6000 Though

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 8140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 8

1 + 5 gt 1 + 4 and 1 + 5 gt 2 + 3 it is probable forf (1)+f (5) lt f (1)+f (4) and f (1)+f (5) lt f (2)+f (3)

To correctly perform the secure ranked multi-keyword search the sum of any two encoded rel-evance scores should still be ordered and privacypreserved (we use the sum of encoded relevance scoreto evaluate the relevance between a file and multiplekeywords in this paper) To satisfy this conditionwe further design an additive order and privacypreserving function family based on Eq 7

F yaoppf (x) =991761

0lejkleτ

Ajk middot m(x j) middot m(y k) + raof (8)

Where τ denotes the degree of F yaoppf (x) and Ajkdenotes the coefficients of m(x j) middot m(y k)

Before we take a step further towards the useof F yaoppf (x) we will first introduce our auxiliarytheorem ie how to make an order and privacypreserving function to be additive order and privacypreserving

Definition 3 Given an order preserving functiony = f (x) we define ∆f (xi) = f (xi+1) minus f (xi) and∆f (xi) = ∆f (xi+1) minus ∆f (xi) To preserve privacy wedefine yi = f (xi) + ri where ri is a random numberand ri gt 0 We define the max value of ri as rimaxand rimax = f (xi)minusimiddot∆(xi) hence yi can change fromf (xi) to f (xi) + rimax

Theorem 1 Given an order preserving function yi =f (xi) + ri when the following three conditionsare satisfied (1) foralli ∆f (xi + 1) le ∆f (xi) (2) foralli

˜∆f (xi + 1) ge ˜

∆f (xi) (3) foralli ri lt 983080i

2

middot ˜∆f (xi)983081 2The order and privacy preserving function yi is also

an additive order and privacy preserving that isfor any

sumxiisinD1

xi le sum

xjisinD2xj where D1 and D2

denotes two subsets of the definition domain we havesumxiisinD1

yi lesum

xjisinD2yj Proof The proof is elaborated in Appendix A

63 Encoding relevance scores

In our paper we use a well-known method to com-pute the relevance score [26] ie Score(w F d) =1

|F d|(1+ln f dw)ln(1+ N

f w) where w denotes the given

keyword |F d| is the length of file F d f dw denotes theT F of w in file F d f w denotes the number of filescontaining w and N denotes the total number of filesin the collection

Now we introduce how to encode the relevancescore with an additive order and privacy preservingfunction in our additive order and privacy preservingfunction family F yaoppf (x) Given input S ijt (the rel-evance score of tth keyword to jth file of ith dataowner) Data owners can independently choose afunction from this family to protect the privacy of their relevance scores For simplicity in this paper we

specify data owner i to choose F H i(i)aoppf (middot) and defineV ijt as the encoded data of S ijt

V ijt = F H i(i)aoppf (S ijt) (9)

where H i(middot) is a public known hash function and i isthe I D of data owner Oi

With the well-designed properties of F aoppf thecloud server can make a comparison among encodedrelevance scores for the same data owner Howeversince different data owners encode their relevancescores with different functions in F aoppf the cloudserver cannot make a comparison between encodedrelevance scores for different data owners To solvethis problem we define

T ijt(y) = F yaoppf (S ijt) (10)

where T ijt(y) is used to help the cloud servermake comparisons among relevance scores encoded by different data owners and y is a variable whichtakes the hash value of data ownerrsquos ID as input

Finally we attach each V ijt with a T ijt(y)

64 Ranking search results

In this paper we use the sum of the relevance scoresas the metric to rank search results Now we intro-duce the strategies of ranking search results based onthe encoded relevance scores First the cloud com-putes V ij =

sumtisinW

V ijt the sum of encoded relevance

scores between the jth file and matched keywords forOi and the auxiliary value T ij(y) =

sumtisinW

T ijt(y) Then

the cloud ranks the sum of encoded relevance scorewith the following two conditions

(1) Two encoded data belong to the same dataowner Given that a data user issues a query W =wm wn we assume that Oirsquos F 1 and F 2 satisfy thequery Then the cloud adds the encoded relevancescore together and gets the relevance score of Oiprimes F 1to W

V i1 = V i1m + V i1n (11)

Similarly

V i2 = V i2m + V i2n (12)

If V i1 ge V i2 then F 1 is more relevant to the queried

keyword sets W otherwise F 2 is more relevant to W (2) Two encoded data belong to two different data

owners Given that a data user issues a query W =wm wn we assume Oirsquos F 1 and Oj rsquos F 2 satisfy thequery The cloud server makes comparison on theirencoded relevance scores in the following four steps

First the cloud server computes the relevance score

of Oiprimes F 1 to W

V i1 = V i1m + V i1n (13)

Second the cloud server computes the T j2(y)

T j2(y) = T j2m(y) + T j2n(y) (14)

Third the cloud server substitutes H i(i) for thevariable y and gets T j2(H (i))

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 9140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 9

C 11

C 12

C 21

C 23

O1 O2

H (1)

Fig 5 Example of ranking search results

Finally the cloud draws the conclusion if V i1 ge

T j2(H i(i)) then Oiprimes F 1 is more relevant to W thanOj primes F 2 otherwise Oj primes F 2 is more relevant to W

Now the cloud server can make comparisons onencoded relevance scores Thus it is easy for the cloudto return the top-k relevant files to the data user

Fig 5 shows an example of making comparisons onthe encoded relevance scores As we can see giventhe trapdoor T w1 the cloud server finds that w11 andw21 match T w1 Data owner O1 has two encryptedfiles C 11 and C 12 that contain w11 Data ownerO2 also has two encrypted files C 21 and C 23 thatcontain w21 For O1 he compares V 111 with V 121 if V 111 gt V 121 the cloud server regards C 11 as beingmore relevant to T w1 otherwise C 21 is more relevantto T w1 We assume C 12 is more relevant to T w1 for O1

and C 21 is more relevant for O2 then the cloud makescomparisons between C 12 and C 21 As shown in thefigure the cloud first substitutes H (1) to T 211(y) andgets T 211(H (1)) then he compares T 211(H (1)) withV 121 if V 121 gt T 211(H (1)) Then O1rsquos encrypted fileC 12 is more relevant to T w1 otherwise O2rsquos encryptedfile C 21 is more relevant to the search request

7 SECURITY ANALYSIS

In this section we provide step-by-step security anal-yses to demonstrate that the security requirementshave been satisfied for the data files the keywordsthe queries and the relevance scores

71 Data Files

The data files are protected by symmetric encryption before upload As long as the encryption algorithm isnot breakable the cloud server cannot know the data

72 Keywords

We formulate the security goals achieved by PRMSMwith the following two theorems

Theorem 2 Given the DDH (Decisional DiffieHell-man) assumption PRMSM is semantically secure a-gainst the chosen keyword attack under the selectivesecurity modelProof See Appendix B

Theorem 3 Given the DL (Discrete Logarithm) as-sumption PRMSM achieves keyword secrecy in therandom oracle modelProof See Appendix C

73 Trapdoors

Recall the trapdoor construction formula

T whprime =983080

gH (whprime )middotrumiddotka1middotka2middotra grumiddotka1 grumiddotka1middotra983081

(15)

If the cloud server wants to know the actual valueof the trapdoor and distinguish two trapdoors it has

to solve the discrete logarithm problem in Z p withlarge prime p therefore the privacy of trapdoor isprotected as long as the discrete logarithm problem ishard

74 Relevance Scores

In our scheme relevance scores are encoded with twoAdditive Order and Privacy Preserving FunctionsNow we analyze the security of additive order andprivacy preserving functions Assume the input for

the F yaopp

(x) is

s and the data owner ID is

i Thenthe cloud server can only capture the following value

that is derived from s

F (H i(i))aopp (s) =991761

0lejkleτ

Ajk middotm(s j)middotm(H i(i) k)+raof

(16)Assume the cloud server has collected n encod-

ed relevance scores for the same relevance score s(F

(H i(i))aopp (s)) then he can construct n equations How-

ever these functions have n + 1 unknown variablesTherefore it is infeasible for the cloud server to breakthe additive order and privacy preserving F yaopp(x)

Thus the security of the corresponding F yaopp(x) en-coded relevance score is also preserved

8 PERFORMANCE EVALUATION

In this section we measure the efficiency of PRMSMand compare it with its previous version SecureRanked Multi-keyword Search for Multiple data own-ers in cloud computing (SRMSM) [27] and the state-of-the-art privacy-preserving Multi-keyword RankedSearch over Encrypted cloud data (MRSE) [11] side

by side Since MRSE is only suitable for the singleowner model our PRMSM and SRMSM not onlywork well in multi-owner settings but also outper-form MRSE on many aspects

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 10140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 10

Number of files in the dataset (times103)

T i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(a) For different size datasetswith the same dictionary u =4000

Number of keywords in dictionary(times103) T

i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(b) For the same dataset withdifferent dictionary sizes n =1000

T i m e o f E n c o d i n g y ( s )

Number of keywords in dictionary (times103)

(c) Time cost of encoding rel-evance score (a subroutine of

building index)

Fig 6 Time cost of index construction

Number of keywords in dictionary(times103) T i m e o f g e n e r a t i n g t r a p d o o r ( s )

(a) For different size of key-word dictionary with the samenumber of queried keywordsq = 100

Number of queried keywords (times102) T i m e o f g e

n e r a t i n g t r a p d o o r ( times 1 0 - 2 s )

(b) For different number of queried keywords with thesame size of keyword dictio-nary u = 4000

Fig 7 Time cost of generating trapdoors

81 Evaluation Settings

We conduct performance experiments on a real data

set the Internet Request For Comments dataset (RFC)[28] We use Hermetic Word Frequency Counter [29]to extract keywords from each RFC file After thekeyword extraction we compute keyword statisticssuch as the keyword frequency in each file the lengthof each file the number of files containing a specifickeyword etc We further calculate the relevance scoreof a keyword to a file based on these statistics Thefile size and keyword frequency of this data set can be seen in [20]

The experiment programs are coded using thePython programming language on a PC with 22GHZ

Intel Core CPU and 2GB memory We implementall necessary routines for data owners to preprocessdata files for the data user to generate trapdoorsfor the administrative server to re-encrypt keywordstrapdoors and for the cloud server to perform rankedsearches We use the Weil pairing [30] to construct our bilinear map

82 Evaluation Results

821 Index Construction

Fig 6(a) shows that given the same keyword dictio-

nary (u=4000) time of index construction for theseschemes increases linearly with an increasing numberof files while SRMSM and PRMSM spend much lesstime on index construction Fig 6(b) demonstrates

that given the same number of files (n=1000) SRMSMand PRMSM consume much less time than MRSE

on constructing indexes Additionally SRMSM andPRMSM are insensitive to the size of the keyworddictionary for index construction while MRSE suf-fers a quadratic growth with the size of keyworddictionary increases Fig 6(c) shows the encodingefficiency of our proposed AOPPF The time spent onencoding increases from 01s to 1s when the numberof keywords increases from 1000 to 10000 This timecost can be acceptable

822 Trapdoor Generation

Compared with index construction trapdoor genera-

tion consumes relatively less time Fig 7(a) demon-strates that given the same number of queried key-words (q=100) SRMSM and PRMSM are insensitive tothe size of keyword dictionary on trapdoor generationand consumes 0026s and 0031s respectively Mean-while MRSE increases from 004s to 62s Fig 7(b)shows that given the same number of dictionary size(u=4000) when the number of queried keywords in-creases from 100 to 1000 the trapdoor generation timefor MRSE is 031s and remains unchanged WhileSRMSM increases from 0024s to 025s PRMSM in-creases from 0031s to 031s We observe that PRMSM

spends a little more time than SRMSM on trapdoorgeneration the reason is that PRMSM introduces anadditional variable to ensure the randomness of trap-doors

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 11140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 11

T

i m e c o s t o f r e - e n c r y p t i o n ( s )

Average number of keywords per owner

(a) Time cost of re-encryptingkeywords

Average number of trapdoors per user

T i m e c o s t o f r e - e n c r y p t i o n ( s )

(b) Time cost of re-encryptingtrapdoors

Fig 8 Time cost of the administration server

Number of queried keywords

T i m e o f q u e r y ( s )

(a) For different size of queriedkeywords with the same sizeof data set n=2000 and key-word dictionary u = 100

Number of files in the dataset (times103)

T i m e o f q u e r y ( s )

(b) For different size of data setwith the same size of queriedkeywords q=5 and keyworddictionary u = 100

Number of keywords in dictionary (times102

T i m e o f q u e r y ( s )

(c) For different size of key-word dictionary with the samesize of queried keywords q=5and data set n = 2000

Fig 9 Time cost of search

823 Re-encryption by the administration server

Fig 8(a) illustrates the re-encryption time cost of the

administration server in PRMSM As we can see forthe same average number of keywords per owner themore data owners are involved the more time is spenton re-encryption When there are 300 data ownerseach data owner has 100 keywords we need 38s tore-encrypt these keywords which is acceptable

Fig 8(b) demonstrates the time cost of re-encryptingtrapdoors We observe that for the same averagenumber of trapdoors per user the more data usersthat submit trapdoors the more time would be spenton re-encryption When there are 1000 data userswho concurrently submit data each data user has 10

trapdoors we only need 334s for re-encryption

824 Search

From Fig 9 we observe that PRMSM spends moretime for searching The fundamental reason is thatthe pairing operation used in PRMSM needs moretime As we can see from Fig 9(a) and Fig 9(c)the more keywords existing in the cloud server themore time is required for pairing operation Fig 9(b)confirms that when the number of keywords stored onthe cloud server remains a constant PRMSM will not

increase even if the number of files increases ThoughPRMSM spends relatively more time this observationalso confirms that the searching operation should beoutsourced to the cloud server

9 RELATED WOR K

In this section we review three categories of worksearchable encryption secure keyword search in cloudcomputing and order preserving encryption

91 Searchable Encryption

The earliest attempt of searchable encryption wasmade by Song et al In [3] they propose to encrypteach word in a file independently and allow the serverto find whether a single queried keyword is containedin the file without knowing the exact word Thisproposal is more of theoretic interests because of highcomputational costs Goh et al propose building akeyword index for each file and using Bloom filter

to accelerate the search [4] Curtmola et al propose building indices for each keyword and use hashtables as an alternative approach to searchable en-cryption [5] The first public key scheme for keywordsearch over encrypted data is presented in [6] [7] and[8] further enrich the search functionalities of search-able encryption by proposing schemes for conjunctivekeyword search

The searchable encryption cares mostly about singlekeyword search or boolean keyword search Extend-ing these techniques for ranked multi-keyword searchwill incur heavy computation and storage costs

92 Secure Keyword Search in Cloud Computing

The privacy concerns in cloud computing motivatethe study on secure keyword search Wang et al

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 12140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 12

first defined and solved the secure ranked keywordsearch over encrypted cloud data In [9] and [18] theyproposed a scheme that returns the top-k relevant filesupon a single keyword search Cao et al [10] [11]and Sun et al [31] [12] extended the secure keywordsearch for multi-keyword queries Their approachesvectorize the list of keywords and apply matrix mul-tiplications to hide the actual keyword informationfrom the cloud server while still allowing the serverto find out the top-k relevant data files Xu et alproposed MKQE (Multi-Keyword ranked Query onEncrypted data) that enables a dynamic keyword dic-tionary and avoids the ranking order being distorted by several high frequency keywords [13] Li et al[14] Chuah et al [15] Xu et al [16] and Wang et al[17] proposed fuzzy keyword search over encryptedcloud data aiming at tolerance of both minor typosand format inconsistencies for usersrsquo search input [19]further proposed privacy-assured similarity searchmechanisms over outsourced cloud data In [20] weproposed a secure efficient and distributed keywordsearch protocol in the geo-distributed cloud environ-ment

The system model of these previous works onlyconsider one data owner which implies that in theirsolutions the data owner and data users can easilycommunicate and exchange secret information Whennumerous data owners are involved in the systemsecret information exchanging will cause considerablecommunication overhead Sun et al [21] and Zheng

et al [22] proposed secure attribute-based keywordsearch schemes in the challenging scenario wheremultiple owners are involved However applying CP-ABE in the cloud system would introduce problemsfor data user revocation ie the cloud has to updatethe large amount of data stored on it for a data userrevocation [32] Additionally they do not supportprivacy preserving ranked multi-keyword search Ourpaper differs from previous studies regarding the em-phasis of multiple data owners in the system modelThis paper seeks a solution scheme to maximally relaxthe requirements for data owners and users so that

the scheme could be suitable for a large number of cloud computing users

93 Order Preserving Encryption

The order preserving encryption is used to preventthe cloud server from knowing the exact relevancescores of keywords to a data file The early work of Agrawal et al proposed an Order Preserving sym-metric Encryption (OPE) scheme where the numericalorder of plain texts are preserved [33] Boldyreva etal further introduced a modular order preserving

encryption in [34] Yi et al [35] proposed an order p-reserving function to encode data in sensor networksPopa et al [36] recently proposed an ideal-secureorder-preserving encryption scheme Kerschbaum et

al [37] further proposed a scheme which is not onlyidea-secure but is also an efficient order-preservingencryption scheme However these schemes are notadditive order preserving As a complementary workto the previous order preserving work we propose anew additive order and privacy preserving functions(AOPPF) Data owners can freely choose any functionfrom an AOPPF family to encode their relevancescores The cloud server computes the sum of encodedrelevance scores and ranks them based on the sum

10 CONCLUSIONS

In this paper we explore the problem of securemulti-keyword search for multiple data owners andmultiple data users in the cloud computing envi-ronment Different from prior works our schemesenable authenticated data users to achieve secureconvenient and efficient searches over multiple dataownersrsquo data To efficiently authenticate data usersand detect attackers who steal the secret key andperform illegal searches we propose a novel dynamicsecret key generation protocol and a new data userauthentication protocol To enable the cloud server toperform secure search among multiple ownersrsquo dataencrypted with different secret keys we systematical-ly construct a novel secure search protocol To rank thesearch results and preserve the privacy of relevancescores between keywords and files we propose anovel Additive Order and Privacy Preserving Func-tion family Moreover we show that our approachis computationally efficient even for large data andkeyword sets As our future work on one hand wewill consider the problem of secure fuzzy keywordsearch in a multi-owner paradigm On the other handwe plan to implement our scheme on the commercialclouds

ACKNOWLEDGMENTS

This work is supported in part by the National Natu-ral Science Foundation of China (Project No 6117303861472125 61300217) and the China Scholarship Coun-cil A preliminary version of this paper appearedin the Proceedings of the IEEEIFIP InternationalConference on Dependable Systems and Networks(DSNrsquo14) [27]

REFERENCES

[1] M Armbrust A Fox R Griffith A D Joseph R KatzA Konwinski G Lee D Patterson A Rabkin I Stoica andM Zaharia ldquoA view of cloud computingrdquo Communication of the ACM vol 53 no 4 pp 50ndash58 2010

[2] C Wang S S Chow Q Wang K Ren and W Lou ldquoPrivacy-preserving public auditing for secure cloud storagerdquo Comput-

ers IEEE Transactions on vol 62 no 2 pp 362ndash375 2013[3] DSong DWagner and APerrig ldquoPractical techniques for

searches on encrypted datardquo in Proc IEEE International Sym- posium on Security and Privacy (SampPrsquo00) Nagoya Japan Jan2000 pp 44ndash55

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 13140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 13

[4] E Goh (2003) Secure indexes [Online] Availablehttpeprintiacrorg

[5] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proc ACM CCSrsquo06 VA USA Oct 2006 pp79ndash88

[6] D B et al ldquoPublic key encryption with keyword search secureagainst keyword guessing attacks without random oraclerdquoEUROCRYPT vol 43 pp 506ndash522 2004

[7] P Golle J Staddon and B Waters ldquoSecure conjunctive key-word search over encrypted datardquo in Proc Applied Cryptogra-

phy and Network Security (ACNSrsquo04) Yellow Mountain China Jun 2004 pp 31ndash45

[8] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo in ProcInformation and Communications Security (ICICSrsquo05) BeijingChina Dec 2005 pp 414ndash426

[9] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo in Proc IEEEDistributed Computing Systems (ICDCSrsquo10) Genoa Italy Jun2010 pp 253ndash262

[10] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo in Proc IEEE INFOCOMrsquo11 Shanghai China Apr 2011pp 829ndash837

[11] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo Parallel and Distributed Systems IEEE Transactions onvol 25 no 1 pp 222ndash233 2014

[12] W Sun B Wang N Cao M Li W Lou Y T Hou and H LildquoVerifiable privacy-preserving multi-keyword text search inthe cloud supporting similarity-based rankingrdquo Parallel andDistributed Systems IEEE Transactions on vol 25 no 11 pp3025ndash3035 2014

[13] Z Xu W Kang R Li K Yow and C Xu ldquoEfficient multi-keyword ranked query on encrypted data in the cloudrdquo inProc IEEE Parallel and Distributed Systems (ICPADSrsquo12) Singa-pore Dec 2012 pp 244ndash251

[14] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo in

Proc IEEE INFOCOMrsquo10 San Diego CA Mar 2010 pp 1ndash5[15] M Chuah and W Hu ldquoPrivacy-aware bedtree based solutionfor fuzzy multi-keyword search over encrypted datardquo in ProcIEEE 31th International Conference on Distributed ComputingSystems (ICDCSrsquo11) Minneapolis MN Jun 2011 pp 383ndash392

[16] P Xu H Jin Q Wu and W Wang ldquoPublic-key encryptionwith fuzzy keyword search A provably secure scheme underkeyword guessing attackrdquo Computers IEEE Transactions onvol 62 no 11 pp 2266ndash2277 2013

[17] B Wang S Yu W Lou and Y T Hou ldquoPrivacy-preservingmulti-keyword fuzzy search over encrypted data in thecloudrdquo in IEEE INFOCOM Toronto Canada May 2014 pp2112ndash2120

[18] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoParallel and Distributed Systems IEEE Transactions on vol 23

no 8 pp 1467ndash1479 2012[19] C Wang K Ren S Yu and K M R Urs ldquoAchieving usableand privacy-assured similarity search over outsourced clouddatardquo in Proc IEEE INFOCOMrsquo12 Orlando FL Mar 2012 pp451ndash459

[20] W Zhang Y Lin S Xiao Q Liu and T Zhou ldquoSecuredistributed keyword search in multiple cloudsrdquo in ProcIEEEACM IWQOSrsquo14 Hongkong May 2014 pp 370ndash379

[21] W Sun S Yu W Lou Y T Hou and H Li ldquoProtectingyour right Attribute-based keyword search with fine-grainedowner-enforced search authorization in the cloudrdquo in ProcIEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 226ndash234

[22] Q Zheng S Xu and G Ateniese ldquoVabks Verifiable attribute- based keyword search over outsourced encrypted datardquo inProc IEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 522ndash530

[23] T Jung X Y Li Z Wan and M Wan ldquoPrivacy preservingcloud data access with multi-authoritiesrdquo in Proc IEEE INFO-COMrsquo13 Turin Italy Apr 2013 pp 2625ndash2633

[24] Q Liu C C Tan J Wu and G Wang ldquoEfficient informationretrieval for ranked queries in cost-effective cloud environ-

mentsrdquo in INFOCOM 2012 Proceedings IEEE IEEE 2012 pp2581ndash2585

[25] B Chor E Kushilevitz O Goldreich and M Sudan ldquoPrivateinformation retrievalrdquo Journal of the ACM vol 45 no 6 pp965ndash981 1998

[26] I H Witten A Moffat and T C Bell Managing gigabytesCompressing and indexing documents and images San FranciscoUSA Morgan Kaufmann 1999

[27] W Zhang S Xiao Y Lin T Zhou and S Zhou ldquoSecure ranked

multi-keyword search for multiple data owners in cloud com-putingrdquo in Proc 44th Annual IEEEIFIP International Conferenceon Dependable Systems and Networks (DSN2014) Atlanta USAIEEE jun 2014 pp 276ndash286

[28] IETF ldquoRequest for comments databaserdquo [Online] Availablehttpwwwietforgrfchtml

[29] H Systems ldquoHermetic word frequency counterrdquo [Online]Available httpwwwhermeticchwfcwfchtm

[30] D Boneh and M Franklin ldquoIdentity-based encryption fromthe weil pairingrdquo SIAM Journal on Computing vol 32 no 3pp 586ndash615 2003

[31] W Sun B Wang N Cao M Li W Lou Y T Hou andH Li ldquoPrivacy-preserving multi-keyword text search in thecloud supporting similarity-based rankingrdquo in Proc IEEE

ASIACCSrsquo13 Hangzhou China May 2013 pp 71ndash81[32] J Hur ldquoImproving security and efficiency in attribute-based

data sharingrdquo IEEE TRANSACTIONS ON KNOWLEDGE ANDDATA ENGINEERING vol 25 no 10 pp 2271ndash2282 2013

[33] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preserv-ing encryption for numeric datardquo in Proc ACM SIGMODrsquo04Paris France Jun 2004 pp 563ndash574

[34] A Boldyreva N Chenette Y Lee and A O ldquoOrder-preserving encryption revisited Improved security analysisand alternative solutionsrdquo in Proc Advances in Cryptology(CRYPTOrsquo11) California USA Aug 2011 pp 578ndash595

[35] Y Yi R Li F Chen A X Liu and Y Lin ldquoA digitalwatermarking approach to secure and precise range queryprocessing in sensor networksrdquo in Proc IEEE INFOCOMrsquo13Turin Italy Apr 2013 pp 1950ndash1958

[36] R A Popa F H Li and N Zeldovich ldquoAn ideal-security pro-tocol for order-preserving encodingrdquo in Security and Privacy

(SP) 2013 IEEE Symposium on IEEE 2013 pp 463ndash477[37] F Kerschbaum and A Schroepfer ldquoOptimal average-complexity ideal-security order-preserving encryptionrdquo inProceedings of the 2014 ACM SIGSAC Conference on Computerand Communications Security ACM 2014 pp 275ndash286

Wei Zhang received his BS degree in Com-puter Science from Hunan University Chinain 2011 Since 2011 He has been a PhDcandidate in College of Computer Scienceand Electronic Engineering Hunan Universi-ty His research interests include cloud com-puting network security and data mining

Yaping Lin received his BS degree in Com-puter Application from Hunan University Chi-na in 1982 and his MS degree in Com-puter Application from National University ofDefense Technology China in 1985 He re-ceived his PhD degree in Control Theoryand Application from Hunan University in2000 He has been a professor and PhD

supervisor in Hunan University since 1996From 2004-2005 he worked as a visitingresearcher at the University of Texas at Ar-

lington His research interests include machine learning networksecurity and wireless sensor networks

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 1414

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 14

Sheng Xiao received his BS degree in Ts-inghua University China in 2002 and hisMS degree from the National SingaporeUniversityMIT (Singapore-MIT Alliance) in2003 He received his PhD degree fromthe University of Massachusetts Amherst in2013 He has been an associate professor atthe College of Computer Science and Elec-tronic Engineering Hunan University He re-

ceived INFOCOM Best Paper in 2010 His re-search interests include communication se-

curity cloud computing and big data

Jie Wu is the chair and a Laura H Carnellprofessor with the Department of Computerand Information Sciences Temple Universi-ty Philadelphia PA Prior to joining TempleUniversity he was a program director withthe National Science Foundation and a dis-tinguished professor with Florida Atlantic U-niversity Boca Raton FL He regularly pub-lishes in scholarly journals conference pro-ceedings and books His research interestsinclude mobile computing and wireless net-

works routing protocols cloud and green computing network trustand security and social network applications He serves on severaleditorial boards including IEEE Transactions on Computers IEEETransactions on Service Computing and Journal of Parallel andDistributed Computing He was general cochairchair for IEEE MASS2006 IEEE IPDPS 2008 and IEEE ICDCS 2013 as well as programcochair for IEEE INFOCOM 2011 and CCF CNCC 2013 he servedas a general chair forACM MobiHoc 2014 He was an IEEE Com-puter Society distinguished visitor ACM distinguished speaker andchair for the IEEE Technical Committee on Distributed Processing(TCDP) He is a CCF distinguished speaker and a fellow of the IEEEHe is the recipient of the 2011 China Computer Federation (CCF)Overseas Outstanding Achievement Award

Siwang Zhou received his BS degree inApplied Physics from Fudan University in1995 He received his PhD degree in com-puter application technology from Hunan U-niversity in 2007 He has been an associateprofessor at the College of Computer Sci-ence and Electronic Engineering Hunan U-niversity His research interests include wire-less sensor networks and wavelet compres-

sive sensing

Page 7: Privacy Preserving Ranked Multi-Keyword

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 7140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 7

Upon receiving T primewhprime the administration server firstgenerates a random number ra and then re-encryptsT primewhprime as follows

T whprime =983080

gH (whprime )middotrumiddotka1middotka2middotra grumiddotka1 grumiddotka1middotra983081

(4)

For easy description and understanding we let

T 1 = g

H (whprime )middotrumiddotka1middotka2middotra

T 2 = g

rumiddotka1

T 3 = g

rumiddotka1middotra

hence T whprime = (T 1 T 2 T 3) Finally the administrationserver submits T whprime to the cloud server

55 Keywords Matching among Different DataOwners

The cloud server stores all encrypted files and key-words of different data owners The administrationserver will also store a secret data S a = gka1middotka2middotra

on the cloud server Upon receiving a query requestthe cloud will search over the data of all these dataowners The cloud processes the search request in two

steps First the cloud matches the queried keywordsfrom all keywords stored on it and it gets a candidatefile set Second the cloud ranks files in the candidatefile set and finds the most top-k relevant files Weintroduce the matching strategy here while leavingthe task of introducing the ranking strategy in the nextsection When the cloud obtains the trapdoor T whprime andencrypted keywords (E o E a) he first computes

e (S a T 2)

= e852008

gramiddotka1middotka2 grumiddotka1852009

(5)

= e(g g)ramiddotka1middotka2middotrumiddotka1

Then he can judge whether whprime = wih (ie anencrypted keyword is located) holds if the followingequation is true

e (E a T 3)

= e

1048616983080gkiwmiddotromiddotH (wih) middot gka1

983081ka2 grumiddotka1middotra

1048617= e(g g)

(kiw middotromiddotH (wih)+ka1)middotka2middotrumiddotka1middotra (6)

= e(g g)kiwmiddotromiddotH (wih)middotka2middotrumiddotka1middotra middot e (S a T 2)

= e983080

gkiwmiddotro gH (wih)middotka2middotrumiddotka1middotra983081

middot e (S a T 2)

= e (E o T 1) middot e (S a T 2)

6 PRIVACY PRESERVING RANKED SEARCH

The aforementioned section helps the cloud matchthe queried keywords and obtain a candidate fileset However we cannot simply return undifferentialfiles to data users for the following two reasons Firstreturning all candidate files would cause abundantcommunication overhead for the whole system Sec-ond data users would only concern the top-k relevantfiles corresponding to their queries In this sectionwe first elucidate an order and privacy preserving

encoding scheme Then we illustrate an additive orderpreserving and privacy preserving encoding schemeFinally we apply the proposed scheme to encode therelevance scores and obtain the top-k search results

x 1 2 3 4 5

f ( x) 100-1000 1100-1800 2000-4200 4300-5000 5100-7000

Fig 4 An example of Order Preserving and PrivacyPreserving Function

61 Order and Privacy Preserving Function

To rank the relevance score while preserving its priva-cy the proposed function should satisfy the followingconditions 1) This function should preserve the orderof data as this helps the cloud server determine whichfile is more relevant to a certain keyword according tothe encoded relevance scores 2) This function shouldnot be revealed by the cloud server so that cloudserver can make comparisons on encoded relevancescores without knowing their actual values 3) Dif-ferent data owners should have different functionssuch that revealing the encoded value of a data ownerwould not lead to the leakage of encoded values of other data owners In order to satisfy condition 1we introduce a data processing part m(x middot) whichpreserves the order of x To satisfy condition 2 weintroduce a disturbing part rf which helps prevent thecloud server from revealing this function To satisfycondition 3 we use m(x middot) to process the ID of dataowners So this function belongs to the followingfunction family

F yoppf (x) =991761

0lejkleτ

Ajk middot m(x j) middot m(y k) + rf (7)

where τ denotes the degree of F yoppf (x) and Ajkdenotes the coefficients of m(x j) middot m(y k) We furtherdefine m(x j) as follows m(x 0) = 1 m(x 1) = xand m(x j) = (m(x jminus1)+α middot x) middot (1+λ) if jgt1 whereα and λ are two constant numbers

Now we introduce how to set the disturbing partrf Since

sum0lejkleτ Ajkmiddot(m(x+1 j)minusm(x j))middotm(y k)gesum

0lejkleτ Ajkmiddot((1+λ)jminus1

+ αmiddotsum

1leilejminus2(1+λ))

Let l be an integer such that2lminus1le

sum0lejkleτ Ajkmiddot((1+λ)

jminus1+αmiddot

sum1leilejminus2(1+λ))le2l

we can set rf isin (0 2lminus1)Obviously forallx1gtx2 we have F yoppf (x1)gtF yoppf (x2)

62 Additive Order and Privacy Preserving Func-tion

With the order and privacy preserving function wehave if x1 gt x2 then y1 gt y2 However the additionof the function is not necessarily order preserving

An example is presented in Fig 4 Obviously f (x)

is order preserving to preserve privacy a disturbingpart is introduced As we can see f (1) + f (5) variesfrom 5200 to 8000 f (1) + f (4) varies from 4400 to6000 f (2) + f (3) varies from 3100 to 6000 Though

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 8140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 8

1 + 5 gt 1 + 4 and 1 + 5 gt 2 + 3 it is probable forf (1)+f (5) lt f (1)+f (4) and f (1)+f (5) lt f (2)+f (3)

To correctly perform the secure ranked multi-keyword search the sum of any two encoded rel-evance scores should still be ordered and privacypreserved (we use the sum of encoded relevance scoreto evaluate the relevance between a file and multiplekeywords in this paper) To satisfy this conditionwe further design an additive order and privacypreserving function family based on Eq 7

F yaoppf (x) =991761

0lejkleτ

Ajk middot m(x j) middot m(y k) + raof (8)

Where τ denotes the degree of F yaoppf (x) and Ajkdenotes the coefficients of m(x j) middot m(y k)

Before we take a step further towards the useof F yaoppf (x) we will first introduce our auxiliarytheorem ie how to make an order and privacypreserving function to be additive order and privacypreserving

Definition 3 Given an order preserving functiony = f (x) we define ∆f (xi) = f (xi+1) minus f (xi) and∆f (xi) = ∆f (xi+1) minus ∆f (xi) To preserve privacy wedefine yi = f (xi) + ri where ri is a random numberand ri gt 0 We define the max value of ri as rimaxand rimax = f (xi)minusimiddot∆(xi) hence yi can change fromf (xi) to f (xi) + rimax

Theorem 1 Given an order preserving function yi =f (xi) + ri when the following three conditionsare satisfied (1) foralli ∆f (xi + 1) le ∆f (xi) (2) foralli

˜∆f (xi + 1) ge ˜

∆f (xi) (3) foralli ri lt 983080i

2

middot ˜∆f (xi)983081 2The order and privacy preserving function yi is also

an additive order and privacy preserving that isfor any

sumxiisinD1

xi le sum

xjisinD2xj where D1 and D2

denotes two subsets of the definition domain we havesumxiisinD1

yi lesum

xjisinD2yj Proof The proof is elaborated in Appendix A

63 Encoding relevance scores

In our paper we use a well-known method to com-pute the relevance score [26] ie Score(w F d) =1

|F d|(1+ln f dw)ln(1+ N

f w) where w denotes the given

keyword |F d| is the length of file F d f dw denotes theT F of w in file F d f w denotes the number of filescontaining w and N denotes the total number of filesin the collection

Now we introduce how to encode the relevancescore with an additive order and privacy preservingfunction in our additive order and privacy preservingfunction family F yaoppf (x) Given input S ijt (the rel-evance score of tth keyword to jth file of ith dataowner) Data owners can independently choose afunction from this family to protect the privacy of their relevance scores For simplicity in this paper we

specify data owner i to choose F H i(i)aoppf (middot) and defineV ijt as the encoded data of S ijt

V ijt = F H i(i)aoppf (S ijt) (9)

where H i(middot) is a public known hash function and i isthe I D of data owner Oi

With the well-designed properties of F aoppf thecloud server can make a comparison among encodedrelevance scores for the same data owner Howeversince different data owners encode their relevancescores with different functions in F aoppf the cloudserver cannot make a comparison between encodedrelevance scores for different data owners To solvethis problem we define

T ijt(y) = F yaoppf (S ijt) (10)

where T ijt(y) is used to help the cloud servermake comparisons among relevance scores encoded by different data owners and y is a variable whichtakes the hash value of data ownerrsquos ID as input

Finally we attach each V ijt with a T ijt(y)

64 Ranking search results

In this paper we use the sum of the relevance scoresas the metric to rank search results Now we intro-duce the strategies of ranking search results based onthe encoded relevance scores First the cloud com-putes V ij =

sumtisinW

V ijt the sum of encoded relevance

scores between the jth file and matched keywords forOi and the auxiliary value T ij(y) =

sumtisinW

T ijt(y) Then

the cloud ranks the sum of encoded relevance scorewith the following two conditions

(1) Two encoded data belong to the same dataowner Given that a data user issues a query W =wm wn we assume that Oirsquos F 1 and F 2 satisfy thequery Then the cloud adds the encoded relevancescore together and gets the relevance score of Oiprimes F 1to W

V i1 = V i1m + V i1n (11)

Similarly

V i2 = V i2m + V i2n (12)

If V i1 ge V i2 then F 1 is more relevant to the queried

keyword sets W otherwise F 2 is more relevant to W (2) Two encoded data belong to two different data

owners Given that a data user issues a query W =wm wn we assume Oirsquos F 1 and Oj rsquos F 2 satisfy thequery The cloud server makes comparison on theirencoded relevance scores in the following four steps

First the cloud server computes the relevance score

of Oiprimes F 1 to W

V i1 = V i1m + V i1n (13)

Second the cloud server computes the T j2(y)

T j2(y) = T j2m(y) + T j2n(y) (14)

Third the cloud server substitutes H i(i) for thevariable y and gets T j2(H (i))

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 9140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 9

C 11

C 12

C 21

C 23

O1 O2

H (1)

Fig 5 Example of ranking search results

Finally the cloud draws the conclusion if V i1 ge

T j2(H i(i)) then Oiprimes F 1 is more relevant to W thanOj primes F 2 otherwise Oj primes F 2 is more relevant to W

Now the cloud server can make comparisons onencoded relevance scores Thus it is easy for the cloudto return the top-k relevant files to the data user

Fig 5 shows an example of making comparisons onthe encoded relevance scores As we can see giventhe trapdoor T w1 the cloud server finds that w11 andw21 match T w1 Data owner O1 has two encryptedfiles C 11 and C 12 that contain w11 Data ownerO2 also has two encrypted files C 21 and C 23 thatcontain w21 For O1 he compares V 111 with V 121 if V 111 gt V 121 the cloud server regards C 11 as beingmore relevant to T w1 otherwise C 21 is more relevantto T w1 We assume C 12 is more relevant to T w1 for O1

and C 21 is more relevant for O2 then the cloud makescomparisons between C 12 and C 21 As shown in thefigure the cloud first substitutes H (1) to T 211(y) andgets T 211(H (1)) then he compares T 211(H (1)) withV 121 if V 121 gt T 211(H (1)) Then O1rsquos encrypted fileC 12 is more relevant to T w1 otherwise O2rsquos encryptedfile C 21 is more relevant to the search request

7 SECURITY ANALYSIS

In this section we provide step-by-step security anal-yses to demonstrate that the security requirementshave been satisfied for the data files the keywordsthe queries and the relevance scores

71 Data Files

The data files are protected by symmetric encryption before upload As long as the encryption algorithm isnot breakable the cloud server cannot know the data

72 Keywords

We formulate the security goals achieved by PRMSMwith the following two theorems

Theorem 2 Given the DDH (Decisional DiffieHell-man) assumption PRMSM is semantically secure a-gainst the chosen keyword attack under the selectivesecurity modelProof See Appendix B

Theorem 3 Given the DL (Discrete Logarithm) as-sumption PRMSM achieves keyword secrecy in therandom oracle modelProof See Appendix C

73 Trapdoors

Recall the trapdoor construction formula

T whprime =983080

gH (whprime )middotrumiddotka1middotka2middotra grumiddotka1 grumiddotka1middotra983081

(15)

If the cloud server wants to know the actual valueof the trapdoor and distinguish two trapdoors it has

to solve the discrete logarithm problem in Z p withlarge prime p therefore the privacy of trapdoor isprotected as long as the discrete logarithm problem ishard

74 Relevance Scores

In our scheme relevance scores are encoded with twoAdditive Order and Privacy Preserving FunctionsNow we analyze the security of additive order andprivacy preserving functions Assume the input for

the F yaopp

(x) is

s and the data owner ID is

i Thenthe cloud server can only capture the following value

that is derived from s

F (H i(i))aopp (s) =991761

0lejkleτ

Ajk middotm(s j)middotm(H i(i) k)+raof

(16)Assume the cloud server has collected n encod-

ed relevance scores for the same relevance score s(F

(H i(i))aopp (s)) then he can construct n equations How-

ever these functions have n + 1 unknown variablesTherefore it is infeasible for the cloud server to breakthe additive order and privacy preserving F yaopp(x)

Thus the security of the corresponding F yaopp(x) en-coded relevance score is also preserved

8 PERFORMANCE EVALUATION

In this section we measure the efficiency of PRMSMand compare it with its previous version SecureRanked Multi-keyword Search for Multiple data own-ers in cloud computing (SRMSM) [27] and the state-of-the-art privacy-preserving Multi-keyword RankedSearch over Encrypted cloud data (MRSE) [11] side

by side Since MRSE is only suitable for the singleowner model our PRMSM and SRMSM not onlywork well in multi-owner settings but also outper-form MRSE on many aspects

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 10140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 10

Number of files in the dataset (times103)

T i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(a) For different size datasetswith the same dictionary u =4000

Number of keywords in dictionary(times103) T

i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(b) For the same dataset withdifferent dictionary sizes n =1000

T i m e o f E n c o d i n g y ( s )

Number of keywords in dictionary (times103)

(c) Time cost of encoding rel-evance score (a subroutine of

building index)

Fig 6 Time cost of index construction

Number of keywords in dictionary(times103) T i m e o f g e n e r a t i n g t r a p d o o r ( s )

(a) For different size of key-word dictionary with the samenumber of queried keywordsq = 100

Number of queried keywords (times102) T i m e o f g e

n e r a t i n g t r a p d o o r ( times 1 0 - 2 s )

(b) For different number of queried keywords with thesame size of keyword dictio-nary u = 4000

Fig 7 Time cost of generating trapdoors

81 Evaluation Settings

We conduct performance experiments on a real data

set the Internet Request For Comments dataset (RFC)[28] We use Hermetic Word Frequency Counter [29]to extract keywords from each RFC file After thekeyword extraction we compute keyword statisticssuch as the keyword frequency in each file the lengthof each file the number of files containing a specifickeyword etc We further calculate the relevance scoreof a keyword to a file based on these statistics Thefile size and keyword frequency of this data set can be seen in [20]

The experiment programs are coded using thePython programming language on a PC with 22GHZ

Intel Core CPU and 2GB memory We implementall necessary routines for data owners to preprocessdata files for the data user to generate trapdoorsfor the administrative server to re-encrypt keywordstrapdoors and for the cloud server to perform rankedsearches We use the Weil pairing [30] to construct our bilinear map

82 Evaluation Results

821 Index Construction

Fig 6(a) shows that given the same keyword dictio-

nary (u=4000) time of index construction for theseschemes increases linearly with an increasing numberof files while SRMSM and PRMSM spend much lesstime on index construction Fig 6(b) demonstrates

that given the same number of files (n=1000) SRMSMand PRMSM consume much less time than MRSE

on constructing indexes Additionally SRMSM andPRMSM are insensitive to the size of the keyworddictionary for index construction while MRSE suf-fers a quadratic growth with the size of keyworddictionary increases Fig 6(c) shows the encodingefficiency of our proposed AOPPF The time spent onencoding increases from 01s to 1s when the numberof keywords increases from 1000 to 10000 This timecost can be acceptable

822 Trapdoor Generation

Compared with index construction trapdoor genera-

tion consumes relatively less time Fig 7(a) demon-strates that given the same number of queried key-words (q=100) SRMSM and PRMSM are insensitive tothe size of keyword dictionary on trapdoor generationand consumes 0026s and 0031s respectively Mean-while MRSE increases from 004s to 62s Fig 7(b)shows that given the same number of dictionary size(u=4000) when the number of queried keywords in-creases from 100 to 1000 the trapdoor generation timefor MRSE is 031s and remains unchanged WhileSRMSM increases from 0024s to 025s PRMSM in-creases from 0031s to 031s We observe that PRMSM

spends a little more time than SRMSM on trapdoorgeneration the reason is that PRMSM introduces anadditional variable to ensure the randomness of trap-doors

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 11140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 11

T

i m e c o s t o f r e - e n c r y p t i o n ( s )

Average number of keywords per owner

(a) Time cost of re-encryptingkeywords

Average number of trapdoors per user

T i m e c o s t o f r e - e n c r y p t i o n ( s )

(b) Time cost of re-encryptingtrapdoors

Fig 8 Time cost of the administration server

Number of queried keywords

T i m e o f q u e r y ( s )

(a) For different size of queriedkeywords with the same sizeof data set n=2000 and key-word dictionary u = 100

Number of files in the dataset (times103)

T i m e o f q u e r y ( s )

(b) For different size of data setwith the same size of queriedkeywords q=5 and keyworddictionary u = 100

Number of keywords in dictionary (times102

T i m e o f q u e r y ( s )

(c) For different size of key-word dictionary with the samesize of queried keywords q=5and data set n = 2000

Fig 9 Time cost of search

823 Re-encryption by the administration server

Fig 8(a) illustrates the re-encryption time cost of the

administration server in PRMSM As we can see forthe same average number of keywords per owner themore data owners are involved the more time is spenton re-encryption When there are 300 data ownerseach data owner has 100 keywords we need 38s tore-encrypt these keywords which is acceptable

Fig 8(b) demonstrates the time cost of re-encryptingtrapdoors We observe that for the same averagenumber of trapdoors per user the more data usersthat submit trapdoors the more time would be spenton re-encryption When there are 1000 data userswho concurrently submit data each data user has 10

trapdoors we only need 334s for re-encryption

824 Search

From Fig 9 we observe that PRMSM spends moretime for searching The fundamental reason is thatthe pairing operation used in PRMSM needs moretime As we can see from Fig 9(a) and Fig 9(c)the more keywords existing in the cloud server themore time is required for pairing operation Fig 9(b)confirms that when the number of keywords stored onthe cloud server remains a constant PRMSM will not

increase even if the number of files increases ThoughPRMSM spends relatively more time this observationalso confirms that the searching operation should beoutsourced to the cloud server

9 RELATED WOR K

In this section we review three categories of worksearchable encryption secure keyword search in cloudcomputing and order preserving encryption

91 Searchable Encryption

The earliest attempt of searchable encryption wasmade by Song et al In [3] they propose to encrypteach word in a file independently and allow the serverto find whether a single queried keyword is containedin the file without knowing the exact word Thisproposal is more of theoretic interests because of highcomputational costs Goh et al propose building akeyword index for each file and using Bloom filter

to accelerate the search [4] Curtmola et al propose building indices for each keyword and use hashtables as an alternative approach to searchable en-cryption [5] The first public key scheme for keywordsearch over encrypted data is presented in [6] [7] and[8] further enrich the search functionalities of search-able encryption by proposing schemes for conjunctivekeyword search

The searchable encryption cares mostly about singlekeyword search or boolean keyword search Extend-ing these techniques for ranked multi-keyword searchwill incur heavy computation and storage costs

92 Secure Keyword Search in Cloud Computing

The privacy concerns in cloud computing motivatethe study on secure keyword search Wang et al

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 12140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 12

first defined and solved the secure ranked keywordsearch over encrypted cloud data In [9] and [18] theyproposed a scheme that returns the top-k relevant filesupon a single keyword search Cao et al [10] [11]and Sun et al [31] [12] extended the secure keywordsearch for multi-keyword queries Their approachesvectorize the list of keywords and apply matrix mul-tiplications to hide the actual keyword informationfrom the cloud server while still allowing the serverto find out the top-k relevant data files Xu et alproposed MKQE (Multi-Keyword ranked Query onEncrypted data) that enables a dynamic keyword dic-tionary and avoids the ranking order being distorted by several high frequency keywords [13] Li et al[14] Chuah et al [15] Xu et al [16] and Wang et al[17] proposed fuzzy keyword search over encryptedcloud data aiming at tolerance of both minor typosand format inconsistencies for usersrsquo search input [19]further proposed privacy-assured similarity searchmechanisms over outsourced cloud data In [20] weproposed a secure efficient and distributed keywordsearch protocol in the geo-distributed cloud environ-ment

The system model of these previous works onlyconsider one data owner which implies that in theirsolutions the data owner and data users can easilycommunicate and exchange secret information Whennumerous data owners are involved in the systemsecret information exchanging will cause considerablecommunication overhead Sun et al [21] and Zheng

et al [22] proposed secure attribute-based keywordsearch schemes in the challenging scenario wheremultiple owners are involved However applying CP-ABE in the cloud system would introduce problemsfor data user revocation ie the cloud has to updatethe large amount of data stored on it for a data userrevocation [32] Additionally they do not supportprivacy preserving ranked multi-keyword search Ourpaper differs from previous studies regarding the em-phasis of multiple data owners in the system modelThis paper seeks a solution scheme to maximally relaxthe requirements for data owners and users so that

the scheme could be suitable for a large number of cloud computing users

93 Order Preserving Encryption

The order preserving encryption is used to preventthe cloud server from knowing the exact relevancescores of keywords to a data file The early work of Agrawal et al proposed an Order Preserving sym-metric Encryption (OPE) scheme where the numericalorder of plain texts are preserved [33] Boldyreva etal further introduced a modular order preserving

encryption in [34] Yi et al [35] proposed an order p-reserving function to encode data in sensor networksPopa et al [36] recently proposed an ideal-secureorder-preserving encryption scheme Kerschbaum et

al [37] further proposed a scheme which is not onlyidea-secure but is also an efficient order-preservingencryption scheme However these schemes are notadditive order preserving As a complementary workto the previous order preserving work we propose anew additive order and privacy preserving functions(AOPPF) Data owners can freely choose any functionfrom an AOPPF family to encode their relevancescores The cloud server computes the sum of encodedrelevance scores and ranks them based on the sum

10 CONCLUSIONS

In this paper we explore the problem of securemulti-keyword search for multiple data owners andmultiple data users in the cloud computing envi-ronment Different from prior works our schemesenable authenticated data users to achieve secureconvenient and efficient searches over multiple dataownersrsquo data To efficiently authenticate data usersand detect attackers who steal the secret key andperform illegal searches we propose a novel dynamicsecret key generation protocol and a new data userauthentication protocol To enable the cloud server toperform secure search among multiple ownersrsquo dataencrypted with different secret keys we systematical-ly construct a novel secure search protocol To rank thesearch results and preserve the privacy of relevancescores between keywords and files we propose anovel Additive Order and Privacy Preserving Func-tion family Moreover we show that our approachis computationally efficient even for large data andkeyword sets As our future work on one hand wewill consider the problem of secure fuzzy keywordsearch in a multi-owner paradigm On the other handwe plan to implement our scheme on the commercialclouds

ACKNOWLEDGMENTS

This work is supported in part by the National Natu-ral Science Foundation of China (Project No 6117303861472125 61300217) and the China Scholarship Coun-cil A preliminary version of this paper appearedin the Proceedings of the IEEEIFIP InternationalConference on Dependable Systems and Networks(DSNrsquo14) [27]

REFERENCES

[1] M Armbrust A Fox R Griffith A D Joseph R KatzA Konwinski G Lee D Patterson A Rabkin I Stoica andM Zaharia ldquoA view of cloud computingrdquo Communication of the ACM vol 53 no 4 pp 50ndash58 2010

[2] C Wang S S Chow Q Wang K Ren and W Lou ldquoPrivacy-preserving public auditing for secure cloud storagerdquo Comput-

ers IEEE Transactions on vol 62 no 2 pp 362ndash375 2013[3] DSong DWagner and APerrig ldquoPractical techniques for

searches on encrypted datardquo in Proc IEEE International Sym- posium on Security and Privacy (SampPrsquo00) Nagoya Japan Jan2000 pp 44ndash55

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 13140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 13

[4] E Goh (2003) Secure indexes [Online] Availablehttpeprintiacrorg

[5] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proc ACM CCSrsquo06 VA USA Oct 2006 pp79ndash88

[6] D B et al ldquoPublic key encryption with keyword search secureagainst keyword guessing attacks without random oraclerdquoEUROCRYPT vol 43 pp 506ndash522 2004

[7] P Golle J Staddon and B Waters ldquoSecure conjunctive key-word search over encrypted datardquo in Proc Applied Cryptogra-

phy and Network Security (ACNSrsquo04) Yellow Mountain China Jun 2004 pp 31ndash45

[8] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo in ProcInformation and Communications Security (ICICSrsquo05) BeijingChina Dec 2005 pp 414ndash426

[9] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo in Proc IEEEDistributed Computing Systems (ICDCSrsquo10) Genoa Italy Jun2010 pp 253ndash262

[10] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo in Proc IEEE INFOCOMrsquo11 Shanghai China Apr 2011pp 829ndash837

[11] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo Parallel and Distributed Systems IEEE Transactions onvol 25 no 1 pp 222ndash233 2014

[12] W Sun B Wang N Cao M Li W Lou Y T Hou and H LildquoVerifiable privacy-preserving multi-keyword text search inthe cloud supporting similarity-based rankingrdquo Parallel andDistributed Systems IEEE Transactions on vol 25 no 11 pp3025ndash3035 2014

[13] Z Xu W Kang R Li K Yow and C Xu ldquoEfficient multi-keyword ranked query on encrypted data in the cloudrdquo inProc IEEE Parallel and Distributed Systems (ICPADSrsquo12) Singa-pore Dec 2012 pp 244ndash251

[14] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo in

Proc IEEE INFOCOMrsquo10 San Diego CA Mar 2010 pp 1ndash5[15] M Chuah and W Hu ldquoPrivacy-aware bedtree based solutionfor fuzzy multi-keyword search over encrypted datardquo in ProcIEEE 31th International Conference on Distributed ComputingSystems (ICDCSrsquo11) Minneapolis MN Jun 2011 pp 383ndash392

[16] P Xu H Jin Q Wu and W Wang ldquoPublic-key encryptionwith fuzzy keyword search A provably secure scheme underkeyword guessing attackrdquo Computers IEEE Transactions onvol 62 no 11 pp 2266ndash2277 2013

[17] B Wang S Yu W Lou and Y T Hou ldquoPrivacy-preservingmulti-keyword fuzzy search over encrypted data in thecloudrdquo in IEEE INFOCOM Toronto Canada May 2014 pp2112ndash2120

[18] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoParallel and Distributed Systems IEEE Transactions on vol 23

no 8 pp 1467ndash1479 2012[19] C Wang K Ren S Yu and K M R Urs ldquoAchieving usableand privacy-assured similarity search over outsourced clouddatardquo in Proc IEEE INFOCOMrsquo12 Orlando FL Mar 2012 pp451ndash459

[20] W Zhang Y Lin S Xiao Q Liu and T Zhou ldquoSecuredistributed keyword search in multiple cloudsrdquo in ProcIEEEACM IWQOSrsquo14 Hongkong May 2014 pp 370ndash379

[21] W Sun S Yu W Lou Y T Hou and H Li ldquoProtectingyour right Attribute-based keyword search with fine-grainedowner-enforced search authorization in the cloudrdquo in ProcIEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 226ndash234

[22] Q Zheng S Xu and G Ateniese ldquoVabks Verifiable attribute- based keyword search over outsourced encrypted datardquo inProc IEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 522ndash530

[23] T Jung X Y Li Z Wan and M Wan ldquoPrivacy preservingcloud data access with multi-authoritiesrdquo in Proc IEEE INFO-COMrsquo13 Turin Italy Apr 2013 pp 2625ndash2633

[24] Q Liu C C Tan J Wu and G Wang ldquoEfficient informationretrieval for ranked queries in cost-effective cloud environ-

mentsrdquo in INFOCOM 2012 Proceedings IEEE IEEE 2012 pp2581ndash2585

[25] B Chor E Kushilevitz O Goldreich and M Sudan ldquoPrivateinformation retrievalrdquo Journal of the ACM vol 45 no 6 pp965ndash981 1998

[26] I H Witten A Moffat and T C Bell Managing gigabytesCompressing and indexing documents and images San FranciscoUSA Morgan Kaufmann 1999

[27] W Zhang S Xiao Y Lin T Zhou and S Zhou ldquoSecure ranked

multi-keyword search for multiple data owners in cloud com-putingrdquo in Proc 44th Annual IEEEIFIP International Conferenceon Dependable Systems and Networks (DSN2014) Atlanta USAIEEE jun 2014 pp 276ndash286

[28] IETF ldquoRequest for comments databaserdquo [Online] Availablehttpwwwietforgrfchtml

[29] H Systems ldquoHermetic word frequency counterrdquo [Online]Available httpwwwhermeticchwfcwfchtm

[30] D Boneh and M Franklin ldquoIdentity-based encryption fromthe weil pairingrdquo SIAM Journal on Computing vol 32 no 3pp 586ndash615 2003

[31] W Sun B Wang N Cao M Li W Lou Y T Hou andH Li ldquoPrivacy-preserving multi-keyword text search in thecloud supporting similarity-based rankingrdquo in Proc IEEE

ASIACCSrsquo13 Hangzhou China May 2013 pp 71ndash81[32] J Hur ldquoImproving security and efficiency in attribute-based

data sharingrdquo IEEE TRANSACTIONS ON KNOWLEDGE ANDDATA ENGINEERING vol 25 no 10 pp 2271ndash2282 2013

[33] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preserv-ing encryption for numeric datardquo in Proc ACM SIGMODrsquo04Paris France Jun 2004 pp 563ndash574

[34] A Boldyreva N Chenette Y Lee and A O ldquoOrder-preserving encryption revisited Improved security analysisand alternative solutionsrdquo in Proc Advances in Cryptology(CRYPTOrsquo11) California USA Aug 2011 pp 578ndash595

[35] Y Yi R Li F Chen A X Liu and Y Lin ldquoA digitalwatermarking approach to secure and precise range queryprocessing in sensor networksrdquo in Proc IEEE INFOCOMrsquo13Turin Italy Apr 2013 pp 1950ndash1958

[36] R A Popa F H Li and N Zeldovich ldquoAn ideal-security pro-tocol for order-preserving encodingrdquo in Security and Privacy

(SP) 2013 IEEE Symposium on IEEE 2013 pp 463ndash477[37] F Kerschbaum and A Schroepfer ldquoOptimal average-complexity ideal-security order-preserving encryptionrdquo inProceedings of the 2014 ACM SIGSAC Conference on Computerand Communications Security ACM 2014 pp 275ndash286

Wei Zhang received his BS degree in Com-puter Science from Hunan University Chinain 2011 Since 2011 He has been a PhDcandidate in College of Computer Scienceand Electronic Engineering Hunan Universi-ty His research interests include cloud com-puting network security and data mining

Yaping Lin received his BS degree in Com-puter Application from Hunan University Chi-na in 1982 and his MS degree in Com-puter Application from National University ofDefense Technology China in 1985 He re-ceived his PhD degree in Control Theoryand Application from Hunan University in2000 He has been a professor and PhD

supervisor in Hunan University since 1996From 2004-2005 he worked as a visitingresearcher at the University of Texas at Ar-

lington His research interests include machine learning networksecurity and wireless sensor networks

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 1414

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 14

Sheng Xiao received his BS degree in Ts-inghua University China in 2002 and hisMS degree from the National SingaporeUniversityMIT (Singapore-MIT Alliance) in2003 He received his PhD degree fromthe University of Massachusetts Amherst in2013 He has been an associate professor atthe College of Computer Science and Elec-tronic Engineering Hunan University He re-

ceived INFOCOM Best Paper in 2010 His re-search interests include communication se-

curity cloud computing and big data

Jie Wu is the chair and a Laura H Carnellprofessor with the Department of Computerand Information Sciences Temple Universi-ty Philadelphia PA Prior to joining TempleUniversity he was a program director withthe National Science Foundation and a dis-tinguished professor with Florida Atlantic U-niversity Boca Raton FL He regularly pub-lishes in scholarly journals conference pro-ceedings and books His research interestsinclude mobile computing and wireless net-

works routing protocols cloud and green computing network trustand security and social network applications He serves on severaleditorial boards including IEEE Transactions on Computers IEEETransactions on Service Computing and Journal of Parallel andDistributed Computing He was general cochairchair for IEEE MASS2006 IEEE IPDPS 2008 and IEEE ICDCS 2013 as well as programcochair for IEEE INFOCOM 2011 and CCF CNCC 2013 he servedas a general chair forACM MobiHoc 2014 He was an IEEE Com-puter Society distinguished visitor ACM distinguished speaker andchair for the IEEE Technical Committee on Distributed Processing(TCDP) He is a CCF distinguished speaker and a fellow of the IEEEHe is the recipient of the 2011 China Computer Federation (CCF)Overseas Outstanding Achievement Award

Siwang Zhou received his BS degree inApplied Physics from Fudan University in1995 He received his PhD degree in com-puter application technology from Hunan U-niversity in 2007 He has been an associateprofessor at the College of Computer Sci-ence and Electronic Engineering Hunan U-niversity His research interests include wire-less sensor networks and wavelet compres-

sive sensing

Page 8: Privacy Preserving Ranked Multi-Keyword

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 8140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 8

1 + 5 gt 1 + 4 and 1 + 5 gt 2 + 3 it is probable forf (1)+f (5) lt f (1)+f (4) and f (1)+f (5) lt f (2)+f (3)

To correctly perform the secure ranked multi-keyword search the sum of any two encoded rel-evance scores should still be ordered and privacypreserved (we use the sum of encoded relevance scoreto evaluate the relevance between a file and multiplekeywords in this paper) To satisfy this conditionwe further design an additive order and privacypreserving function family based on Eq 7

F yaoppf (x) =991761

0lejkleτ

Ajk middot m(x j) middot m(y k) + raof (8)

Where τ denotes the degree of F yaoppf (x) and Ajkdenotes the coefficients of m(x j) middot m(y k)

Before we take a step further towards the useof F yaoppf (x) we will first introduce our auxiliarytheorem ie how to make an order and privacypreserving function to be additive order and privacypreserving

Definition 3 Given an order preserving functiony = f (x) we define ∆f (xi) = f (xi+1) minus f (xi) and∆f (xi) = ∆f (xi+1) minus ∆f (xi) To preserve privacy wedefine yi = f (xi) + ri where ri is a random numberand ri gt 0 We define the max value of ri as rimaxand rimax = f (xi)minusimiddot∆(xi) hence yi can change fromf (xi) to f (xi) + rimax

Theorem 1 Given an order preserving function yi =f (xi) + ri when the following three conditionsare satisfied (1) foralli ∆f (xi + 1) le ∆f (xi) (2) foralli

˜∆f (xi + 1) ge ˜

∆f (xi) (3) foralli ri lt 983080i

2

middot ˜∆f (xi)983081 2The order and privacy preserving function yi is also

an additive order and privacy preserving that isfor any

sumxiisinD1

xi le sum

xjisinD2xj where D1 and D2

denotes two subsets of the definition domain we havesumxiisinD1

yi lesum

xjisinD2yj Proof The proof is elaborated in Appendix A

63 Encoding relevance scores

In our paper we use a well-known method to com-pute the relevance score [26] ie Score(w F d) =1

|F d|(1+ln f dw)ln(1+ N

f w) where w denotes the given

keyword |F d| is the length of file F d f dw denotes theT F of w in file F d f w denotes the number of filescontaining w and N denotes the total number of filesin the collection

Now we introduce how to encode the relevancescore with an additive order and privacy preservingfunction in our additive order and privacy preservingfunction family F yaoppf (x) Given input S ijt (the rel-evance score of tth keyword to jth file of ith dataowner) Data owners can independently choose afunction from this family to protect the privacy of their relevance scores For simplicity in this paper we

specify data owner i to choose F H i(i)aoppf (middot) and defineV ijt as the encoded data of S ijt

V ijt = F H i(i)aoppf (S ijt) (9)

where H i(middot) is a public known hash function and i isthe I D of data owner Oi

With the well-designed properties of F aoppf thecloud server can make a comparison among encodedrelevance scores for the same data owner Howeversince different data owners encode their relevancescores with different functions in F aoppf the cloudserver cannot make a comparison between encodedrelevance scores for different data owners To solvethis problem we define

T ijt(y) = F yaoppf (S ijt) (10)

where T ijt(y) is used to help the cloud servermake comparisons among relevance scores encoded by different data owners and y is a variable whichtakes the hash value of data ownerrsquos ID as input

Finally we attach each V ijt with a T ijt(y)

64 Ranking search results

In this paper we use the sum of the relevance scoresas the metric to rank search results Now we intro-duce the strategies of ranking search results based onthe encoded relevance scores First the cloud com-putes V ij =

sumtisinW

V ijt the sum of encoded relevance

scores between the jth file and matched keywords forOi and the auxiliary value T ij(y) =

sumtisinW

T ijt(y) Then

the cloud ranks the sum of encoded relevance scorewith the following two conditions

(1) Two encoded data belong to the same dataowner Given that a data user issues a query W =wm wn we assume that Oirsquos F 1 and F 2 satisfy thequery Then the cloud adds the encoded relevancescore together and gets the relevance score of Oiprimes F 1to W

V i1 = V i1m + V i1n (11)

Similarly

V i2 = V i2m + V i2n (12)

If V i1 ge V i2 then F 1 is more relevant to the queried

keyword sets W otherwise F 2 is more relevant to W (2) Two encoded data belong to two different data

owners Given that a data user issues a query W =wm wn we assume Oirsquos F 1 and Oj rsquos F 2 satisfy thequery The cloud server makes comparison on theirencoded relevance scores in the following four steps

First the cloud server computes the relevance score

of Oiprimes F 1 to W

V i1 = V i1m + V i1n (13)

Second the cloud server computes the T j2(y)

T j2(y) = T j2m(y) + T j2n(y) (14)

Third the cloud server substitutes H i(i) for thevariable y and gets T j2(H (i))

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 9140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 9

C 11

C 12

C 21

C 23

O1 O2

H (1)

Fig 5 Example of ranking search results

Finally the cloud draws the conclusion if V i1 ge

T j2(H i(i)) then Oiprimes F 1 is more relevant to W thanOj primes F 2 otherwise Oj primes F 2 is more relevant to W

Now the cloud server can make comparisons onencoded relevance scores Thus it is easy for the cloudto return the top-k relevant files to the data user

Fig 5 shows an example of making comparisons onthe encoded relevance scores As we can see giventhe trapdoor T w1 the cloud server finds that w11 andw21 match T w1 Data owner O1 has two encryptedfiles C 11 and C 12 that contain w11 Data ownerO2 also has two encrypted files C 21 and C 23 thatcontain w21 For O1 he compares V 111 with V 121 if V 111 gt V 121 the cloud server regards C 11 as beingmore relevant to T w1 otherwise C 21 is more relevantto T w1 We assume C 12 is more relevant to T w1 for O1

and C 21 is more relevant for O2 then the cloud makescomparisons between C 12 and C 21 As shown in thefigure the cloud first substitutes H (1) to T 211(y) andgets T 211(H (1)) then he compares T 211(H (1)) withV 121 if V 121 gt T 211(H (1)) Then O1rsquos encrypted fileC 12 is more relevant to T w1 otherwise O2rsquos encryptedfile C 21 is more relevant to the search request

7 SECURITY ANALYSIS

In this section we provide step-by-step security anal-yses to demonstrate that the security requirementshave been satisfied for the data files the keywordsthe queries and the relevance scores

71 Data Files

The data files are protected by symmetric encryption before upload As long as the encryption algorithm isnot breakable the cloud server cannot know the data

72 Keywords

We formulate the security goals achieved by PRMSMwith the following two theorems

Theorem 2 Given the DDH (Decisional DiffieHell-man) assumption PRMSM is semantically secure a-gainst the chosen keyword attack under the selectivesecurity modelProof See Appendix B

Theorem 3 Given the DL (Discrete Logarithm) as-sumption PRMSM achieves keyword secrecy in therandom oracle modelProof See Appendix C

73 Trapdoors

Recall the trapdoor construction formula

T whprime =983080

gH (whprime )middotrumiddotka1middotka2middotra grumiddotka1 grumiddotka1middotra983081

(15)

If the cloud server wants to know the actual valueof the trapdoor and distinguish two trapdoors it has

to solve the discrete logarithm problem in Z p withlarge prime p therefore the privacy of trapdoor isprotected as long as the discrete logarithm problem ishard

74 Relevance Scores

In our scheme relevance scores are encoded with twoAdditive Order and Privacy Preserving FunctionsNow we analyze the security of additive order andprivacy preserving functions Assume the input for

the F yaopp

(x) is

s and the data owner ID is

i Thenthe cloud server can only capture the following value

that is derived from s

F (H i(i))aopp (s) =991761

0lejkleτ

Ajk middotm(s j)middotm(H i(i) k)+raof

(16)Assume the cloud server has collected n encod-

ed relevance scores for the same relevance score s(F

(H i(i))aopp (s)) then he can construct n equations How-

ever these functions have n + 1 unknown variablesTherefore it is infeasible for the cloud server to breakthe additive order and privacy preserving F yaopp(x)

Thus the security of the corresponding F yaopp(x) en-coded relevance score is also preserved

8 PERFORMANCE EVALUATION

In this section we measure the efficiency of PRMSMand compare it with its previous version SecureRanked Multi-keyword Search for Multiple data own-ers in cloud computing (SRMSM) [27] and the state-of-the-art privacy-preserving Multi-keyword RankedSearch over Encrypted cloud data (MRSE) [11] side

by side Since MRSE is only suitable for the singleowner model our PRMSM and SRMSM not onlywork well in multi-owner settings but also outper-form MRSE on many aspects

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 10140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 10

Number of files in the dataset (times103)

T i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(a) For different size datasetswith the same dictionary u =4000

Number of keywords in dictionary(times103) T

i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(b) For the same dataset withdifferent dictionary sizes n =1000

T i m e o f E n c o d i n g y ( s )

Number of keywords in dictionary (times103)

(c) Time cost of encoding rel-evance score (a subroutine of

building index)

Fig 6 Time cost of index construction

Number of keywords in dictionary(times103) T i m e o f g e n e r a t i n g t r a p d o o r ( s )

(a) For different size of key-word dictionary with the samenumber of queried keywordsq = 100

Number of queried keywords (times102) T i m e o f g e

n e r a t i n g t r a p d o o r ( times 1 0 - 2 s )

(b) For different number of queried keywords with thesame size of keyword dictio-nary u = 4000

Fig 7 Time cost of generating trapdoors

81 Evaluation Settings

We conduct performance experiments on a real data

set the Internet Request For Comments dataset (RFC)[28] We use Hermetic Word Frequency Counter [29]to extract keywords from each RFC file After thekeyword extraction we compute keyword statisticssuch as the keyword frequency in each file the lengthof each file the number of files containing a specifickeyword etc We further calculate the relevance scoreof a keyword to a file based on these statistics Thefile size and keyword frequency of this data set can be seen in [20]

The experiment programs are coded using thePython programming language on a PC with 22GHZ

Intel Core CPU and 2GB memory We implementall necessary routines for data owners to preprocessdata files for the data user to generate trapdoorsfor the administrative server to re-encrypt keywordstrapdoors and for the cloud server to perform rankedsearches We use the Weil pairing [30] to construct our bilinear map

82 Evaluation Results

821 Index Construction

Fig 6(a) shows that given the same keyword dictio-

nary (u=4000) time of index construction for theseschemes increases linearly with an increasing numberof files while SRMSM and PRMSM spend much lesstime on index construction Fig 6(b) demonstrates

that given the same number of files (n=1000) SRMSMand PRMSM consume much less time than MRSE

on constructing indexes Additionally SRMSM andPRMSM are insensitive to the size of the keyworddictionary for index construction while MRSE suf-fers a quadratic growth with the size of keyworddictionary increases Fig 6(c) shows the encodingefficiency of our proposed AOPPF The time spent onencoding increases from 01s to 1s when the numberof keywords increases from 1000 to 10000 This timecost can be acceptable

822 Trapdoor Generation

Compared with index construction trapdoor genera-

tion consumes relatively less time Fig 7(a) demon-strates that given the same number of queried key-words (q=100) SRMSM and PRMSM are insensitive tothe size of keyword dictionary on trapdoor generationand consumes 0026s and 0031s respectively Mean-while MRSE increases from 004s to 62s Fig 7(b)shows that given the same number of dictionary size(u=4000) when the number of queried keywords in-creases from 100 to 1000 the trapdoor generation timefor MRSE is 031s and remains unchanged WhileSRMSM increases from 0024s to 025s PRMSM in-creases from 0031s to 031s We observe that PRMSM

spends a little more time than SRMSM on trapdoorgeneration the reason is that PRMSM introduces anadditional variable to ensure the randomness of trap-doors

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 11140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 11

T

i m e c o s t o f r e - e n c r y p t i o n ( s )

Average number of keywords per owner

(a) Time cost of re-encryptingkeywords

Average number of trapdoors per user

T i m e c o s t o f r e - e n c r y p t i o n ( s )

(b) Time cost of re-encryptingtrapdoors

Fig 8 Time cost of the administration server

Number of queried keywords

T i m e o f q u e r y ( s )

(a) For different size of queriedkeywords with the same sizeof data set n=2000 and key-word dictionary u = 100

Number of files in the dataset (times103)

T i m e o f q u e r y ( s )

(b) For different size of data setwith the same size of queriedkeywords q=5 and keyworddictionary u = 100

Number of keywords in dictionary (times102

T i m e o f q u e r y ( s )

(c) For different size of key-word dictionary with the samesize of queried keywords q=5and data set n = 2000

Fig 9 Time cost of search

823 Re-encryption by the administration server

Fig 8(a) illustrates the re-encryption time cost of the

administration server in PRMSM As we can see forthe same average number of keywords per owner themore data owners are involved the more time is spenton re-encryption When there are 300 data ownerseach data owner has 100 keywords we need 38s tore-encrypt these keywords which is acceptable

Fig 8(b) demonstrates the time cost of re-encryptingtrapdoors We observe that for the same averagenumber of trapdoors per user the more data usersthat submit trapdoors the more time would be spenton re-encryption When there are 1000 data userswho concurrently submit data each data user has 10

trapdoors we only need 334s for re-encryption

824 Search

From Fig 9 we observe that PRMSM spends moretime for searching The fundamental reason is thatthe pairing operation used in PRMSM needs moretime As we can see from Fig 9(a) and Fig 9(c)the more keywords existing in the cloud server themore time is required for pairing operation Fig 9(b)confirms that when the number of keywords stored onthe cloud server remains a constant PRMSM will not

increase even if the number of files increases ThoughPRMSM spends relatively more time this observationalso confirms that the searching operation should beoutsourced to the cloud server

9 RELATED WOR K

In this section we review three categories of worksearchable encryption secure keyword search in cloudcomputing and order preserving encryption

91 Searchable Encryption

The earliest attempt of searchable encryption wasmade by Song et al In [3] they propose to encrypteach word in a file independently and allow the serverto find whether a single queried keyword is containedin the file without knowing the exact word Thisproposal is more of theoretic interests because of highcomputational costs Goh et al propose building akeyword index for each file and using Bloom filter

to accelerate the search [4] Curtmola et al propose building indices for each keyword and use hashtables as an alternative approach to searchable en-cryption [5] The first public key scheme for keywordsearch over encrypted data is presented in [6] [7] and[8] further enrich the search functionalities of search-able encryption by proposing schemes for conjunctivekeyword search

The searchable encryption cares mostly about singlekeyword search or boolean keyword search Extend-ing these techniques for ranked multi-keyword searchwill incur heavy computation and storage costs

92 Secure Keyword Search in Cloud Computing

The privacy concerns in cloud computing motivatethe study on secure keyword search Wang et al

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 12140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 12

first defined and solved the secure ranked keywordsearch over encrypted cloud data In [9] and [18] theyproposed a scheme that returns the top-k relevant filesupon a single keyword search Cao et al [10] [11]and Sun et al [31] [12] extended the secure keywordsearch for multi-keyword queries Their approachesvectorize the list of keywords and apply matrix mul-tiplications to hide the actual keyword informationfrom the cloud server while still allowing the serverto find out the top-k relevant data files Xu et alproposed MKQE (Multi-Keyword ranked Query onEncrypted data) that enables a dynamic keyword dic-tionary and avoids the ranking order being distorted by several high frequency keywords [13] Li et al[14] Chuah et al [15] Xu et al [16] and Wang et al[17] proposed fuzzy keyword search over encryptedcloud data aiming at tolerance of both minor typosand format inconsistencies for usersrsquo search input [19]further proposed privacy-assured similarity searchmechanisms over outsourced cloud data In [20] weproposed a secure efficient and distributed keywordsearch protocol in the geo-distributed cloud environ-ment

The system model of these previous works onlyconsider one data owner which implies that in theirsolutions the data owner and data users can easilycommunicate and exchange secret information Whennumerous data owners are involved in the systemsecret information exchanging will cause considerablecommunication overhead Sun et al [21] and Zheng

et al [22] proposed secure attribute-based keywordsearch schemes in the challenging scenario wheremultiple owners are involved However applying CP-ABE in the cloud system would introduce problemsfor data user revocation ie the cloud has to updatethe large amount of data stored on it for a data userrevocation [32] Additionally they do not supportprivacy preserving ranked multi-keyword search Ourpaper differs from previous studies regarding the em-phasis of multiple data owners in the system modelThis paper seeks a solution scheme to maximally relaxthe requirements for data owners and users so that

the scheme could be suitable for a large number of cloud computing users

93 Order Preserving Encryption

The order preserving encryption is used to preventthe cloud server from knowing the exact relevancescores of keywords to a data file The early work of Agrawal et al proposed an Order Preserving sym-metric Encryption (OPE) scheme where the numericalorder of plain texts are preserved [33] Boldyreva etal further introduced a modular order preserving

encryption in [34] Yi et al [35] proposed an order p-reserving function to encode data in sensor networksPopa et al [36] recently proposed an ideal-secureorder-preserving encryption scheme Kerschbaum et

al [37] further proposed a scheme which is not onlyidea-secure but is also an efficient order-preservingencryption scheme However these schemes are notadditive order preserving As a complementary workto the previous order preserving work we propose anew additive order and privacy preserving functions(AOPPF) Data owners can freely choose any functionfrom an AOPPF family to encode their relevancescores The cloud server computes the sum of encodedrelevance scores and ranks them based on the sum

10 CONCLUSIONS

In this paper we explore the problem of securemulti-keyword search for multiple data owners andmultiple data users in the cloud computing envi-ronment Different from prior works our schemesenable authenticated data users to achieve secureconvenient and efficient searches over multiple dataownersrsquo data To efficiently authenticate data usersand detect attackers who steal the secret key andperform illegal searches we propose a novel dynamicsecret key generation protocol and a new data userauthentication protocol To enable the cloud server toperform secure search among multiple ownersrsquo dataencrypted with different secret keys we systematical-ly construct a novel secure search protocol To rank thesearch results and preserve the privacy of relevancescores between keywords and files we propose anovel Additive Order and Privacy Preserving Func-tion family Moreover we show that our approachis computationally efficient even for large data andkeyword sets As our future work on one hand wewill consider the problem of secure fuzzy keywordsearch in a multi-owner paradigm On the other handwe plan to implement our scheme on the commercialclouds

ACKNOWLEDGMENTS

This work is supported in part by the National Natu-ral Science Foundation of China (Project No 6117303861472125 61300217) and the China Scholarship Coun-cil A preliminary version of this paper appearedin the Proceedings of the IEEEIFIP InternationalConference on Dependable Systems and Networks(DSNrsquo14) [27]

REFERENCES

[1] M Armbrust A Fox R Griffith A D Joseph R KatzA Konwinski G Lee D Patterson A Rabkin I Stoica andM Zaharia ldquoA view of cloud computingrdquo Communication of the ACM vol 53 no 4 pp 50ndash58 2010

[2] C Wang S S Chow Q Wang K Ren and W Lou ldquoPrivacy-preserving public auditing for secure cloud storagerdquo Comput-

ers IEEE Transactions on vol 62 no 2 pp 362ndash375 2013[3] DSong DWagner and APerrig ldquoPractical techniques for

searches on encrypted datardquo in Proc IEEE International Sym- posium on Security and Privacy (SampPrsquo00) Nagoya Japan Jan2000 pp 44ndash55

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 13140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 13

[4] E Goh (2003) Secure indexes [Online] Availablehttpeprintiacrorg

[5] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proc ACM CCSrsquo06 VA USA Oct 2006 pp79ndash88

[6] D B et al ldquoPublic key encryption with keyword search secureagainst keyword guessing attacks without random oraclerdquoEUROCRYPT vol 43 pp 506ndash522 2004

[7] P Golle J Staddon and B Waters ldquoSecure conjunctive key-word search over encrypted datardquo in Proc Applied Cryptogra-

phy and Network Security (ACNSrsquo04) Yellow Mountain China Jun 2004 pp 31ndash45

[8] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo in ProcInformation and Communications Security (ICICSrsquo05) BeijingChina Dec 2005 pp 414ndash426

[9] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo in Proc IEEEDistributed Computing Systems (ICDCSrsquo10) Genoa Italy Jun2010 pp 253ndash262

[10] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo in Proc IEEE INFOCOMrsquo11 Shanghai China Apr 2011pp 829ndash837

[11] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo Parallel and Distributed Systems IEEE Transactions onvol 25 no 1 pp 222ndash233 2014

[12] W Sun B Wang N Cao M Li W Lou Y T Hou and H LildquoVerifiable privacy-preserving multi-keyword text search inthe cloud supporting similarity-based rankingrdquo Parallel andDistributed Systems IEEE Transactions on vol 25 no 11 pp3025ndash3035 2014

[13] Z Xu W Kang R Li K Yow and C Xu ldquoEfficient multi-keyword ranked query on encrypted data in the cloudrdquo inProc IEEE Parallel and Distributed Systems (ICPADSrsquo12) Singa-pore Dec 2012 pp 244ndash251

[14] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo in

Proc IEEE INFOCOMrsquo10 San Diego CA Mar 2010 pp 1ndash5[15] M Chuah and W Hu ldquoPrivacy-aware bedtree based solutionfor fuzzy multi-keyword search over encrypted datardquo in ProcIEEE 31th International Conference on Distributed ComputingSystems (ICDCSrsquo11) Minneapolis MN Jun 2011 pp 383ndash392

[16] P Xu H Jin Q Wu and W Wang ldquoPublic-key encryptionwith fuzzy keyword search A provably secure scheme underkeyword guessing attackrdquo Computers IEEE Transactions onvol 62 no 11 pp 2266ndash2277 2013

[17] B Wang S Yu W Lou and Y T Hou ldquoPrivacy-preservingmulti-keyword fuzzy search over encrypted data in thecloudrdquo in IEEE INFOCOM Toronto Canada May 2014 pp2112ndash2120

[18] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoParallel and Distributed Systems IEEE Transactions on vol 23

no 8 pp 1467ndash1479 2012[19] C Wang K Ren S Yu and K M R Urs ldquoAchieving usableand privacy-assured similarity search over outsourced clouddatardquo in Proc IEEE INFOCOMrsquo12 Orlando FL Mar 2012 pp451ndash459

[20] W Zhang Y Lin S Xiao Q Liu and T Zhou ldquoSecuredistributed keyword search in multiple cloudsrdquo in ProcIEEEACM IWQOSrsquo14 Hongkong May 2014 pp 370ndash379

[21] W Sun S Yu W Lou Y T Hou and H Li ldquoProtectingyour right Attribute-based keyword search with fine-grainedowner-enforced search authorization in the cloudrdquo in ProcIEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 226ndash234

[22] Q Zheng S Xu and G Ateniese ldquoVabks Verifiable attribute- based keyword search over outsourced encrypted datardquo inProc IEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 522ndash530

[23] T Jung X Y Li Z Wan and M Wan ldquoPrivacy preservingcloud data access with multi-authoritiesrdquo in Proc IEEE INFO-COMrsquo13 Turin Italy Apr 2013 pp 2625ndash2633

[24] Q Liu C C Tan J Wu and G Wang ldquoEfficient informationretrieval for ranked queries in cost-effective cloud environ-

mentsrdquo in INFOCOM 2012 Proceedings IEEE IEEE 2012 pp2581ndash2585

[25] B Chor E Kushilevitz O Goldreich and M Sudan ldquoPrivateinformation retrievalrdquo Journal of the ACM vol 45 no 6 pp965ndash981 1998

[26] I H Witten A Moffat and T C Bell Managing gigabytesCompressing and indexing documents and images San FranciscoUSA Morgan Kaufmann 1999

[27] W Zhang S Xiao Y Lin T Zhou and S Zhou ldquoSecure ranked

multi-keyword search for multiple data owners in cloud com-putingrdquo in Proc 44th Annual IEEEIFIP International Conferenceon Dependable Systems and Networks (DSN2014) Atlanta USAIEEE jun 2014 pp 276ndash286

[28] IETF ldquoRequest for comments databaserdquo [Online] Availablehttpwwwietforgrfchtml

[29] H Systems ldquoHermetic word frequency counterrdquo [Online]Available httpwwwhermeticchwfcwfchtm

[30] D Boneh and M Franklin ldquoIdentity-based encryption fromthe weil pairingrdquo SIAM Journal on Computing vol 32 no 3pp 586ndash615 2003

[31] W Sun B Wang N Cao M Li W Lou Y T Hou andH Li ldquoPrivacy-preserving multi-keyword text search in thecloud supporting similarity-based rankingrdquo in Proc IEEE

ASIACCSrsquo13 Hangzhou China May 2013 pp 71ndash81[32] J Hur ldquoImproving security and efficiency in attribute-based

data sharingrdquo IEEE TRANSACTIONS ON KNOWLEDGE ANDDATA ENGINEERING vol 25 no 10 pp 2271ndash2282 2013

[33] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preserv-ing encryption for numeric datardquo in Proc ACM SIGMODrsquo04Paris France Jun 2004 pp 563ndash574

[34] A Boldyreva N Chenette Y Lee and A O ldquoOrder-preserving encryption revisited Improved security analysisand alternative solutionsrdquo in Proc Advances in Cryptology(CRYPTOrsquo11) California USA Aug 2011 pp 578ndash595

[35] Y Yi R Li F Chen A X Liu and Y Lin ldquoA digitalwatermarking approach to secure and precise range queryprocessing in sensor networksrdquo in Proc IEEE INFOCOMrsquo13Turin Italy Apr 2013 pp 1950ndash1958

[36] R A Popa F H Li and N Zeldovich ldquoAn ideal-security pro-tocol for order-preserving encodingrdquo in Security and Privacy

(SP) 2013 IEEE Symposium on IEEE 2013 pp 463ndash477[37] F Kerschbaum and A Schroepfer ldquoOptimal average-complexity ideal-security order-preserving encryptionrdquo inProceedings of the 2014 ACM SIGSAC Conference on Computerand Communications Security ACM 2014 pp 275ndash286

Wei Zhang received his BS degree in Com-puter Science from Hunan University Chinain 2011 Since 2011 He has been a PhDcandidate in College of Computer Scienceand Electronic Engineering Hunan Universi-ty His research interests include cloud com-puting network security and data mining

Yaping Lin received his BS degree in Com-puter Application from Hunan University Chi-na in 1982 and his MS degree in Com-puter Application from National University ofDefense Technology China in 1985 He re-ceived his PhD degree in Control Theoryand Application from Hunan University in2000 He has been a professor and PhD

supervisor in Hunan University since 1996From 2004-2005 he worked as a visitingresearcher at the University of Texas at Ar-

lington His research interests include machine learning networksecurity and wireless sensor networks

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 1414

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 14

Sheng Xiao received his BS degree in Ts-inghua University China in 2002 and hisMS degree from the National SingaporeUniversityMIT (Singapore-MIT Alliance) in2003 He received his PhD degree fromthe University of Massachusetts Amherst in2013 He has been an associate professor atthe College of Computer Science and Elec-tronic Engineering Hunan University He re-

ceived INFOCOM Best Paper in 2010 His re-search interests include communication se-

curity cloud computing and big data

Jie Wu is the chair and a Laura H Carnellprofessor with the Department of Computerand Information Sciences Temple Universi-ty Philadelphia PA Prior to joining TempleUniversity he was a program director withthe National Science Foundation and a dis-tinguished professor with Florida Atlantic U-niversity Boca Raton FL He regularly pub-lishes in scholarly journals conference pro-ceedings and books His research interestsinclude mobile computing and wireless net-

works routing protocols cloud and green computing network trustand security and social network applications He serves on severaleditorial boards including IEEE Transactions on Computers IEEETransactions on Service Computing and Journal of Parallel andDistributed Computing He was general cochairchair for IEEE MASS2006 IEEE IPDPS 2008 and IEEE ICDCS 2013 as well as programcochair for IEEE INFOCOM 2011 and CCF CNCC 2013 he servedas a general chair forACM MobiHoc 2014 He was an IEEE Com-puter Society distinguished visitor ACM distinguished speaker andchair for the IEEE Technical Committee on Distributed Processing(TCDP) He is a CCF distinguished speaker and a fellow of the IEEEHe is the recipient of the 2011 China Computer Federation (CCF)Overseas Outstanding Achievement Award

Siwang Zhou received his BS degree inApplied Physics from Fudan University in1995 He received his PhD degree in com-puter application technology from Hunan U-niversity in 2007 He has been an associateprofessor at the College of Computer Sci-ence and Electronic Engineering Hunan U-niversity His research interests include wire-less sensor networks and wavelet compres-

sive sensing

Page 9: Privacy Preserving Ranked Multi-Keyword

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 9140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 9

C 11

C 12

C 21

C 23

O1 O2

H (1)

Fig 5 Example of ranking search results

Finally the cloud draws the conclusion if V i1 ge

T j2(H i(i)) then Oiprimes F 1 is more relevant to W thanOj primes F 2 otherwise Oj primes F 2 is more relevant to W

Now the cloud server can make comparisons onencoded relevance scores Thus it is easy for the cloudto return the top-k relevant files to the data user

Fig 5 shows an example of making comparisons onthe encoded relevance scores As we can see giventhe trapdoor T w1 the cloud server finds that w11 andw21 match T w1 Data owner O1 has two encryptedfiles C 11 and C 12 that contain w11 Data ownerO2 also has two encrypted files C 21 and C 23 thatcontain w21 For O1 he compares V 111 with V 121 if V 111 gt V 121 the cloud server regards C 11 as beingmore relevant to T w1 otherwise C 21 is more relevantto T w1 We assume C 12 is more relevant to T w1 for O1

and C 21 is more relevant for O2 then the cloud makescomparisons between C 12 and C 21 As shown in thefigure the cloud first substitutes H (1) to T 211(y) andgets T 211(H (1)) then he compares T 211(H (1)) withV 121 if V 121 gt T 211(H (1)) Then O1rsquos encrypted fileC 12 is more relevant to T w1 otherwise O2rsquos encryptedfile C 21 is more relevant to the search request

7 SECURITY ANALYSIS

In this section we provide step-by-step security anal-yses to demonstrate that the security requirementshave been satisfied for the data files the keywordsthe queries and the relevance scores

71 Data Files

The data files are protected by symmetric encryption before upload As long as the encryption algorithm isnot breakable the cloud server cannot know the data

72 Keywords

We formulate the security goals achieved by PRMSMwith the following two theorems

Theorem 2 Given the DDH (Decisional DiffieHell-man) assumption PRMSM is semantically secure a-gainst the chosen keyword attack under the selectivesecurity modelProof See Appendix B

Theorem 3 Given the DL (Discrete Logarithm) as-sumption PRMSM achieves keyword secrecy in therandom oracle modelProof See Appendix C

73 Trapdoors

Recall the trapdoor construction formula

T whprime =983080

gH (whprime )middotrumiddotka1middotka2middotra grumiddotka1 grumiddotka1middotra983081

(15)

If the cloud server wants to know the actual valueof the trapdoor and distinguish two trapdoors it has

to solve the discrete logarithm problem in Z p withlarge prime p therefore the privacy of trapdoor isprotected as long as the discrete logarithm problem ishard

74 Relevance Scores

In our scheme relevance scores are encoded with twoAdditive Order and Privacy Preserving FunctionsNow we analyze the security of additive order andprivacy preserving functions Assume the input for

the F yaopp

(x) is

s and the data owner ID is

i Thenthe cloud server can only capture the following value

that is derived from s

F (H i(i))aopp (s) =991761

0lejkleτ

Ajk middotm(s j)middotm(H i(i) k)+raof

(16)Assume the cloud server has collected n encod-

ed relevance scores for the same relevance score s(F

(H i(i))aopp (s)) then he can construct n equations How-

ever these functions have n + 1 unknown variablesTherefore it is infeasible for the cloud server to breakthe additive order and privacy preserving F yaopp(x)

Thus the security of the corresponding F yaopp(x) en-coded relevance score is also preserved

8 PERFORMANCE EVALUATION

In this section we measure the efficiency of PRMSMand compare it with its previous version SecureRanked Multi-keyword Search for Multiple data own-ers in cloud computing (SRMSM) [27] and the state-of-the-art privacy-preserving Multi-keyword RankedSearch over Encrypted cloud data (MRSE) [11] side

by side Since MRSE is only suitable for the singleowner model our PRMSM and SRMSM not onlywork well in multi-owner settings but also outper-form MRSE on many aspects

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 10140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 10

Number of files in the dataset (times103)

T i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(a) For different size datasetswith the same dictionary u =4000

Number of keywords in dictionary(times103) T

i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(b) For the same dataset withdifferent dictionary sizes n =1000

T i m e o f E n c o d i n g y ( s )

Number of keywords in dictionary (times103)

(c) Time cost of encoding rel-evance score (a subroutine of

building index)

Fig 6 Time cost of index construction

Number of keywords in dictionary(times103) T i m e o f g e n e r a t i n g t r a p d o o r ( s )

(a) For different size of key-word dictionary with the samenumber of queried keywordsq = 100

Number of queried keywords (times102) T i m e o f g e

n e r a t i n g t r a p d o o r ( times 1 0 - 2 s )

(b) For different number of queried keywords with thesame size of keyword dictio-nary u = 4000

Fig 7 Time cost of generating trapdoors

81 Evaluation Settings

We conduct performance experiments on a real data

set the Internet Request For Comments dataset (RFC)[28] We use Hermetic Word Frequency Counter [29]to extract keywords from each RFC file After thekeyword extraction we compute keyword statisticssuch as the keyword frequency in each file the lengthof each file the number of files containing a specifickeyword etc We further calculate the relevance scoreof a keyword to a file based on these statistics Thefile size and keyword frequency of this data set can be seen in [20]

The experiment programs are coded using thePython programming language on a PC with 22GHZ

Intel Core CPU and 2GB memory We implementall necessary routines for data owners to preprocessdata files for the data user to generate trapdoorsfor the administrative server to re-encrypt keywordstrapdoors and for the cloud server to perform rankedsearches We use the Weil pairing [30] to construct our bilinear map

82 Evaluation Results

821 Index Construction

Fig 6(a) shows that given the same keyword dictio-

nary (u=4000) time of index construction for theseschemes increases linearly with an increasing numberof files while SRMSM and PRMSM spend much lesstime on index construction Fig 6(b) demonstrates

that given the same number of files (n=1000) SRMSMand PRMSM consume much less time than MRSE

on constructing indexes Additionally SRMSM andPRMSM are insensitive to the size of the keyworddictionary for index construction while MRSE suf-fers a quadratic growth with the size of keyworddictionary increases Fig 6(c) shows the encodingefficiency of our proposed AOPPF The time spent onencoding increases from 01s to 1s when the numberof keywords increases from 1000 to 10000 This timecost can be acceptable

822 Trapdoor Generation

Compared with index construction trapdoor genera-

tion consumes relatively less time Fig 7(a) demon-strates that given the same number of queried key-words (q=100) SRMSM and PRMSM are insensitive tothe size of keyword dictionary on trapdoor generationand consumes 0026s and 0031s respectively Mean-while MRSE increases from 004s to 62s Fig 7(b)shows that given the same number of dictionary size(u=4000) when the number of queried keywords in-creases from 100 to 1000 the trapdoor generation timefor MRSE is 031s and remains unchanged WhileSRMSM increases from 0024s to 025s PRMSM in-creases from 0031s to 031s We observe that PRMSM

spends a little more time than SRMSM on trapdoorgeneration the reason is that PRMSM introduces anadditional variable to ensure the randomness of trap-doors

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 11140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 11

T

i m e c o s t o f r e - e n c r y p t i o n ( s )

Average number of keywords per owner

(a) Time cost of re-encryptingkeywords

Average number of trapdoors per user

T i m e c o s t o f r e - e n c r y p t i o n ( s )

(b) Time cost of re-encryptingtrapdoors

Fig 8 Time cost of the administration server

Number of queried keywords

T i m e o f q u e r y ( s )

(a) For different size of queriedkeywords with the same sizeof data set n=2000 and key-word dictionary u = 100

Number of files in the dataset (times103)

T i m e o f q u e r y ( s )

(b) For different size of data setwith the same size of queriedkeywords q=5 and keyworddictionary u = 100

Number of keywords in dictionary (times102

T i m e o f q u e r y ( s )

(c) For different size of key-word dictionary with the samesize of queried keywords q=5and data set n = 2000

Fig 9 Time cost of search

823 Re-encryption by the administration server

Fig 8(a) illustrates the re-encryption time cost of the

administration server in PRMSM As we can see forthe same average number of keywords per owner themore data owners are involved the more time is spenton re-encryption When there are 300 data ownerseach data owner has 100 keywords we need 38s tore-encrypt these keywords which is acceptable

Fig 8(b) demonstrates the time cost of re-encryptingtrapdoors We observe that for the same averagenumber of trapdoors per user the more data usersthat submit trapdoors the more time would be spenton re-encryption When there are 1000 data userswho concurrently submit data each data user has 10

trapdoors we only need 334s for re-encryption

824 Search

From Fig 9 we observe that PRMSM spends moretime for searching The fundamental reason is thatthe pairing operation used in PRMSM needs moretime As we can see from Fig 9(a) and Fig 9(c)the more keywords existing in the cloud server themore time is required for pairing operation Fig 9(b)confirms that when the number of keywords stored onthe cloud server remains a constant PRMSM will not

increase even if the number of files increases ThoughPRMSM spends relatively more time this observationalso confirms that the searching operation should beoutsourced to the cloud server

9 RELATED WOR K

In this section we review three categories of worksearchable encryption secure keyword search in cloudcomputing and order preserving encryption

91 Searchable Encryption

The earliest attempt of searchable encryption wasmade by Song et al In [3] they propose to encrypteach word in a file independently and allow the serverto find whether a single queried keyword is containedin the file without knowing the exact word Thisproposal is more of theoretic interests because of highcomputational costs Goh et al propose building akeyword index for each file and using Bloom filter

to accelerate the search [4] Curtmola et al propose building indices for each keyword and use hashtables as an alternative approach to searchable en-cryption [5] The first public key scheme for keywordsearch over encrypted data is presented in [6] [7] and[8] further enrich the search functionalities of search-able encryption by proposing schemes for conjunctivekeyword search

The searchable encryption cares mostly about singlekeyword search or boolean keyword search Extend-ing these techniques for ranked multi-keyword searchwill incur heavy computation and storage costs

92 Secure Keyword Search in Cloud Computing

The privacy concerns in cloud computing motivatethe study on secure keyword search Wang et al

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 12140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 12

first defined and solved the secure ranked keywordsearch over encrypted cloud data In [9] and [18] theyproposed a scheme that returns the top-k relevant filesupon a single keyword search Cao et al [10] [11]and Sun et al [31] [12] extended the secure keywordsearch for multi-keyword queries Their approachesvectorize the list of keywords and apply matrix mul-tiplications to hide the actual keyword informationfrom the cloud server while still allowing the serverto find out the top-k relevant data files Xu et alproposed MKQE (Multi-Keyword ranked Query onEncrypted data) that enables a dynamic keyword dic-tionary and avoids the ranking order being distorted by several high frequency keywords [13] Li et al[14] Chuah et al [15] Xu et al [16] and Wang et al[17] proposed fuzzy keyword search over encryptedcloud data aiming at tolerance of both minor typosand format inconsistencies for usersrsquo search input [19]further proposed privacy-assured similarity searchmechanisms over outsourced cloud data In [20] weproposed a secure efficient and distributed keywordsearch protocol in the geo-distributed cloud environ-ment

The system model of these previous works onlyconsider one data owner which implies that in theirsolutions the data owner and data users can easilycommunicate and exchange secret information Whennumerous data owners are involved in the systemsecret information exchanging will cause considerablecommunication overhead Sun et al [21] and Zheng

et al [22] proposed secure attribute-based keywordsearch schemes in the challenging scenario wheremultiple owners are involved However applying CP-ABE in the cloud system would introduce problemsfor data user revocation ie the cloud has to updatethe large amount of data stored on it for a data userrevocation [32] Additionally they do not supportprivacy preserving ranked multi-keyword search Ourpaper differs from previous studies regarding the em-phasis of multiple data owners in the system modelThis paper seeks a solution scheme to maximally relaxthe requirements for data owners and users so that

the scheme could be suitable for a large number of cloud computing users

93 Order Preserving Encryption

The order preserving encryption is used to preventthe cloud server from knowing the exact relevancescores of keywords to a data file The early work of Agrawal et al proposed an Order Preserving sym-metric Encryption (OPE) scheme where the numericalorder of plain texts are preserved [33] Boldyreva etal further introduced a modular order preserving

encryption in [34] Yi et al [35] proposed an order p-reserving function to encode data in sensor networksPopa et al [36] recently proposed an ideal-secureorder-preserving encryption scheme Kerschbaum et

al [37] further proposed a scheme which is not onlyidea-secure but is also an efficient order-preservingencryption scheme However these schemes are notadditive order preserving As a complementary workto the previous order preserving work we propose anew additive order and privacy preserving functions(AOPPF) Data owners can freely choose any functionfrom an AOPPF family to encode their relevancescores The cloud server computes the sum of encodedrelevance scores and ranks them based on the sum

10 CONCLUSIONS

In this paper we explore the problem of securemulti-keyword search for multiple data owners andmultiple data users in the cloud computing envi-ronment Different from prior works our schemesenable authenticated data users to achieve secureconvenient and efficient searches over multiple dataownersrsquo data To efficiently authenticate data usersand detect attackers who steal the secret key andperform illegal searches we propose a novel dynamicsecret key generation protocol and a new data userauthentication protocol To enable the cloud server toperform secure search among multiple ownersrsquo dataencrypted with different secret keys we systematical-ly construct a novel secure search protocol To rank thesearch results and preserve the privacy of relevancescores between keywords and files we propose anovel Additive Order and Privacy Preserving Func-tion family Moreover we show that our approachis computationally efficient even for large data andkeyword sets As our future work on one hand wewill consider the problem of secure fuzzy keywordsearch in a multi-owner paradigm On the other handwe plan to implement our scheme on the commercialclouds

ACKNOWLEDGMENTS

This work is supported in part by the National Natu-ral Science Foundation of China (Project No 6117303861472125 61300217) and the China Scholarship Coun-cil A preliminary version of this paper appearedin the Proceedings of the IEEEIFIP InternationalConference on Dependable Systems and Networks(DSNrsquo14) [27]

REFERENCES

[1] M Armbrust A Fox R Griffith A D Joseph R KatzA Konwinski G Lee D Patterson A Rabkin I Stoica andM Zaharia ldquoA view of cloud computingrdquo Communication of the ACM vol 53 no 4 pp 50ndash58 2010

[2] C Wang S S Chow Q Wang K Ren and W Lou ldquoPrivacy-preserving public auditing for secure cloud storagerdquo Comput-

ers IEEE Transactions on vol 62 no 2 pp 362ndash375 2013[3] DSong DWagner and APerrig ldquoPractical techniques for

searches on encrypted datardquo in Proc IEEE International Sym- posium on Security and Privacy (SampPrsquo00) Nagoya Japan Jan2000 pp 44ndash55

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 13140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 13

[4] E Goh (2003) Secure indexes [Online] Availablehttpeprintiacrorg

[5] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proc ACM CCSrsquo06 VA USA Oct 2006 pp79ndash88

[6] D B et al ldquoPublic key encryption with keyword search secureagainst keyword guessing attacks without random oraclerdquoEUROCRYPT vol 43 pp 506ndash522 2004

[7] P Golle J Staddon and B Waters ldquoSecure conjunctive key-word search over encrypted datardquo in Proc Applied Cryptogra-

phy and Network Security (ACNSrsquo04) Yellow Mountain China Jun 2004 pp 31ndash45

[8] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo in ProcInformation and Communications Security (ICICSrsquo05) BeijingChina Dec 2005 pp 414ndash426

[9] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo in Proc IEEEDistributed Computing Systems (ICDCSrsquo10) Genoa Italy Jun2010 pp 253ndash262

[10] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo in Proc IEEE INFOCOMrsquo11 Shanghai China Apr 2011pp 829ndash837

[11] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo Parallel and Distributed Systems IEEE Transactions onvol 25 no 1 pp 222ndash233 2014

[12] W Sun B Wang N Cao M Li W Lou Y T Hou and H LildquoVerifiable privacy-preserving multi-keyword text search inthe cloud supporting similarity-based rankingrdquo Parallel andDistributed Systems IEEE Transactions on vol 25 no 11 pp3025ndash3035 2014

[13] Z Xu W Kang R Li K Yow and C Xu ldquoEfficient multi-keyword ranked query on encrypted data in the cloudrdquo inProc IEEE Parallel and Distributed Systems (ICPADSrsquo12) Singa-pore Dec 2012 pp 244ndash251

[14] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo in

Proc IEEE INFOCOMrsquo10 San Diego CA Mar 2010 pp 1ndash5[15] M Chuah and W Hu ldquoPrivacy-aware bedtree based solutionfor fuzzy multi-keyword search over encrypted datardquo in ProcIEEE 31th International Conference on Distributed ComputingSystems (ICDCSrsquo11) Minneapolis MN Jun 2011 pp 383ndash392

[16] P Xu H Jin Q Wu and W Wang ldquoPublic-key encryptionwith fuzzy keyword search A provably secure scheme underkeyword guessing attackrdquo Computers IEEE Transactions onvol 62 no 11 pp 2266ndash2277 2013

[17] B Wang S Yu W Lou and Y T Hou ldquoPrivacy-preservingmulti-keyword fuzzy search over encrypted data in thecloudrdquo in IEEE INFOCOM Toronto Canada May 2014 pp2112ndash2120

[18] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoParallel and Distributed Systems IEEE Transactions on vol 23

no 8 pp 1467ndash1479 2012[19] C Wang K Ren S Yu and K M R Urs ldquoAchieving usableand privacy-assured similarity search over outsourced clouddatardquo in Proc IEEE INFOCOMrsquo12 Orlando FL Mar 2012 pp451ndash459

[20] W Zhang Y Lin S Xiao Q Liu and T Zhou ldquoSecuredistributed keyword search in multiple cloudsrdquo in ProcIEEEACM IWQOSrsquo14 Hongkong May 2014 pp 370ndash379

[21] W Sun S Yu W Lou Y T Hou and H Li ldquoProtectingyour right Attribute-based keyword search with fine-grainedowner-enforced search authorization in the cloudrdquo in ProcIEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 226ndash234

[22] Q Zheng S Xu and G Ateniese ldquoVabks Verifiable attribute- based keyword search over outsourced encrypted datardquo inProc IEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 522ndash530

[23] T Jung X Y Li Z Wan and M Wan ldquoPrivacy preservingcloud data access with multi-authoritiesrdquo in Proc IEEE INFO-COMrsquo13 Turin Italy Apr 2013 pp 2625ndash2633

[24] Q Liu C C Tan J Wu and G Wang ldquoEfficient informationretrieval for ranked queries in cost-effective cloud environ-

mentsrdquo in INFOCOM 2012 Proceedings IEEE IEEE 2012 pp2581ndash2585

[25] B Chor E Kushilevitz O Goldreich and M Sudan ldquoPrivateinformation retrievalrdquo Journal of the ACM vol 45 no 6 pp965ndash981 1998

[26] I H Witten A Moffat and T C Bell Managing gigabytesCompressing and indexing documents and images San FranciscoUSA Morgan Kaufmann 1999

[27] W Zhang S Xiao Y Lin T Zhou and S Zhou ldquoSecure ranked

multi-keyword search for multiple data owners in cloud com-putingrdquo in Proc 44th Annual IEEEIFIP International Conferenceon Dependable Systems and Networks (DSN2014) Atlanta USAIEEE jun 2014 pp 276ndash286

[28] IETF ldquoRequest for comments databaserdquo [Online] Availablehttpwwwietforgrfchtml

[29] H Systems ldquoHermetic word frequency counterrdquo [Online]Available httpwwwhermeticchwfcwfchtm

[30] D Boneh and M Franklin ldquoIdentity-based encryption fromthe weil pairingrdquo SIAM Journal on Computing vol 32 no 3pp 586ndash615 2003

[31] W Sun B Wang N Cao M Li W Lou Y T Hou andH Li ldquoPrivacy-preserving multi-keyword text search in thecloud supporting similarity-based rankingrdquo in Proc IEEE

ASIACCSrsquo13 Hangzhou China May 2013 pp 71ndash81[32] J Hur ldquoImproving security and efficiency in attribute-based

data sharingrdquo IEEE TRANSACTIONS ON KNOWLEDGE ANDDATA ENGINEERING vol 25 no 10 pp 2271ndash2282 2013

[33] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preserv-ing encryption for numeric datardquo in Proc ACM SIGMODrsquo04Paris France Jun 2004 pp 563ndash574

[34] A Boldyreva N Chenette Y Lee and A O ldquoOrder-preserving encryption revisited Improved security analysisand alternative solutionsrdquo in Proc Advances in Cryptology(CRYPTOrsquo11) California USA Aug 2011 pp 578ndash595

[35] Y Yi R Li F Chen A X Liu and Y Lin ldquoA digitalwatermarking approach to secure and precise range queryprocessing in sensor networksrdquo in Proc IEEE INFOCOMrsquo13Turin Italy Apr 2013 pp 1950ndash1958

[36] R A Popa F H Li and N Zeldovich ldquoAn ideal-security pro-tocol for order-preserving encodingrdquo in Security and Privacy

(SP) 2013 IEEE Symposium on IEEE 2013 pp 463ndash477[37] F Kerschbaum and A Schroepfer ldquoOptimal average-complexity ideal-security order-preserving encryptionrdquo inProceedings of the 2014 ACM SIGSAC Conference on Computerand Communications Security ACM 2014 pp 275ndash286

Wei Zhang received his BS degree in Com-puter Science from Hunan University Chinain 2011 Since 2011 He has been a PhDcandidate in College of Computer Scienceand Electronic Engineering Hunan Universi-ty His research interests include cloud com-puting network security and data mining

Yaping Lin received his BS degree in Com-puter Application from Hunan University Chi-na in 1982 and his MS degree in Com-puter Application from National University ofDefense Technology China in 1985 He re-ceived his PhD degree in Control Theoryand Application from Hunan University in2000 He has been a professor and PhD

supervisor in Hunan University since 1996From 2004-2005 he worked as a visitingresearcher at the University of Texas at Ar-

lington His research interests include machine learning networksecurity and wireless sensor networks

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 1414

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 14

Sheng Xiao received his BS degree in Ts-inghua University China in 2002 and hisMS degree from the National SingaporeUniversityMIT (Singapore-MIT Alliance) in2003 He received his PhD degree fromthe University of Massachusetts Amherst in2013 He has been an associate professor atthe College of Computer Science and Elec-tronic Engineering Hunan University He re-

ceived INFOCOM Best Paper in 2010 His re-search interests include communication se-

curity cloud computing and big data

Jie Wu is the chair and a Laura H Carnellprofessor with the Department of Computerand Information Sciences Temple Universi-ty Philadelphia PA Prior to joining TempleUniversity he was a program director withthe National Science Foundation and a dis-tinguished professor with Florida Atlantic U-niversity Boca Raton FL He regularly pub-lishes in scholarly journals conference pro-ceedings and books His research interestsinclude mobile computing and wireless net-

works routing protocols cloud and green computing network trustand security and social network applications He serves on severaleditorial boards including IEEE Transactions on Computers IEEETransactions on Service Computing and Journal of Parallel andDistributed Computing He was general cochairchair for IEEE MASS2006 IEEE IPDPS 2008 and IEEE ICDCS 2013 as well as programcochair for IEEE INFOCOM 2011 and CCF CNCC 2013 he servedas a general chair forACM MobiHoc 2014 He was an IEEE Com-puter Society distinguished visitor ACM distinguished speaker andchair for the IEEE Technical Committee on Distributed Processing(TCDP) He is a CCF distinguished speaker and a fellow of the IEEEHe is the recipient of the 2011 China Computer Federation (CCF)Overseas Outstanding Achievement Award

Siwang Zhou received his BS degree inApplied Physics from Fudan University in1995 He received his PhD degree in com-puter application technology from Hunan U-niversity in 2007 He has been an associateprofessor at the College of Computer Sci-ence and Electronic Engineering Hunan U-niversity His research interests include wire-less sensor networks and wavelet compres-

sive sensing

Page 10: Privacy Preserving Ranked Multi-Keyword

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 10140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 10

Number of files in the dataset (times103)

T i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(a) For different size datasetswith the same dictionary u =4000

Number of keywords in dictionary(times103) T

i m e o f b u i l d i n g i n d e x ( times 1 0 3 s )

(b) For the same dataset withdifferent dictionary sizes n =1000

T i m e o f E n c o d i n g y ( s )

Number of keywords in dictionary (times103)

(c) Time cost of encoding rel-evance score (a subroutine of

building index)

Fig 6 Time cost of index construction

Number of keywords in dictionary(times103) T i m e o f g e n e r a t i n g t r a p d o o r ( s )

(a) For different size of key-word dictionary with the samenumber of queried keywordsq = 100

Number of queried keywords (times102) T i m e o f g e

n e r a t i n g t r a p d o o r ( times 1 0 - 2 s )

(b) For different number of queried keywords with thesame size of keyword dictio-nary u = 4000

Fig 7 Time cost of generating trapdoors

81 Evaluation Settings

We conduct performance experiments on a real data

set the Internet Request For Comments dataset (RFC)[28] We use Hermetic Word Frequency Counter [29]to extract keywords from each RFC file After thekeyword extraction we compute keyword statisticssuch as the keyword frequency in each file the lengthof each file the number of files containing a specifickeyword etc We further calculate the relevance scoreof a keyword to a file based on these statistics Thefile size and keyword frequency of this data set can be seen in [20]

The experiment programs are coded using thePython programming language on a PC with 22GHZ

Intel Core CPU and 2GB memory We implementall necessary routines for data owners to preprocessdata files for the data user to generate trapdoorsfor the administrative server to re-encrypt keywordstrapdoors and for the cloud server to perform rankedsearches We use the Weil pairing [30] to construct our bilinear map

82 Evaluation Results

821 Index Construction

Fig 6(a) shows that given the same keyword dictio-

nary (u=4000) time of index construction for theseschemes increases linearly with an increasing numberof files while SRMSM and PRMSM spend much lesstime on index construction Fig 6(b) demonstrates

that given the same number of files (n=1000) SRMSMand PRMSM consume much less time than MRSE

on constructing indexes Additionally SRMSM andPRMSM are insensitive to the size of the keyworddictionary for index construction while MRSE suf-fers a quadratic growth with the size of keyworddictionary increases Fig 6(c) shows the encodingefficiency of our proposed AOPPF The time spent onencoding increases from 01s to 1s when the numberof keywords increases from 1000 to 10000 This timecost can be acceptable

822 Trapdoor Generation

Compared with index construction trapdoor genera-

tion consumes relatively less time Fig 7(a) demon-strates that given the same number of queried key-words (q=100) SRMSM and PRMSM are insensitive tothe size of keyword dictionary on trapdoor generationand consumes 0026s and 0031s respectively Mean-while MRSE increases from 004s to 62s Fig 7(b)shows that given the same number of dictionary size(u=4000) when the number of queried keywords in-creases from 100 to 1000 the trapdoor generation timefor MRSE is 031s and remains unchanged WhileSRMSM increases from 0024s to 025s PRMSM in-creases from 0031s to 031s We observe that PRMSM

spends a little more time than SRMSM on trapdoorgeneration the reason is that PRMSM introduces anadditional variable to ensure the randomness of trap-doors

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 11140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 11

T

i m e c o s t o f r e - e n c r y p t i o n ( s )

Average number of keywords per owner

(a) Time cost of re-encryptingkeywords

Average number of trapdoors per user

T i m e c o s t o f r e - e n c r y p t i o n ( s )

(b) Time cost of re-encryptingtrapdoors

Fig 8 Time cost of the administration server

Number of queried keywords

T i m e o f q u e r y ( s )

(a) For different size of queriedkeywords with the same sizeof data set n=2000 and key-word dictionary u = 100

Number of files in the dataset (times103)

T i m e o f q u e r y ( s )

(b) For different size of data setwith the same size of queriedkeywords q=5 and keyworddictionary u = 100

Number of keywords in dictionary (times102

T i m e o f q u e r y ( s )

(c) For different size of key-word dictionary with the samesize of queried keywords q=5and data set n = 2000

Fig 9 Time cost of search

823 Re-encryption by the administration server

Fig 8(a) illustrates the re-encryption time cost of the

administration server in PRMSM As we can see forthe same average number of keywords per owner themore data owners are involved the more time is spenton re-encryption When there are 300 data ownerseach data owner has 100 keywords we need 38s tore-encrypt these keywords which is acceptable

Fig 8(b) demonstrates the time cost of re-encryptingtrapdoors We observe that for the same averagenumber of trapdoors per user the more data usersthat submit trapdoors the more time would be spenton re-encryption When there are 1000 data userswho concurrently submit data each data user has 10

trapdoors we only need 334s for re-encryption

824 Search

From Fig 9 we observe that PRMSM spends moretime for searching The fundamental reason is thatthe pairing operation used in PRMSM needs moretime As we can see from Fig 9(a) and Fig 9(c)the more keywords existing in the cloud server themore time is required for pairing operation Fig 9(b)confirms that when the number of keywords stored onthe cloud server remains a constant PRMSM will not

increase even if the number of files increases ThoughPRMSM spends relatively more time this observationalso confirms that the searching operation should beoutsourced to the cloud server

9 RELATED WOR K

In this section we review three categories of worksearchable encryption secure keyword search in cloudcomputing and order preserving encryption

91 Searchable Encryption

The earliest attempt of searchable encryption wasmade by Song et al In [3] they propose to encrypteach word in a file independently and allow the serverto find whether a single queried keyword is containedin the file without knowing the exact word Thisproposal is more of theoretic interests because of highcomputational costs Goh et al propose building akeyword index for each file and using Bloom filter

to accelerate the search [4] Curtmola et al propose building indices for each keyword and use hashtables as an alternative approach to searchable en-cryption [5] The first public key scheme for keywordsearch over encrypted data is presented in [6] [7] and[8] further enrich the search functionalities of search-able encryption by proposing schemes for conjunctivekeyword search

The searchable encryption cares mostly about singlekeyword search or boolean keyword search Extend-ing these techniques for ranked multi-keyword searchwill incur heavy computation and storage costs

92 Secure Keyword Search in Cloud Computing

The privacy concerns in cloud computing motivatethe study on secure keyword search Wang et al

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 12140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 12

first defined and solved the secure ranked keywordsearch over encrypted cloud data In [9] and [18] theyproposed a scheme that returns the top-k relevant filesupon a single keyword search Cao et al [10] [11]and Sun et al [31] [12] extended the secure keywordsearch for multi-keyword queries Their approachesvectorize the list of keywords and apply matrix mul-tiplications to hide the actual keyword informationfrom the cloud server while still allowing the serverto find out the top-k relevant data files Xu et alproposed MKQE (Multi-Keyword ranked Query onEncrypted data) that enables a dynamic keyword dic-tionary and avoids the ranking order being distorted by several high frequency keywords [13] Li et al[14] Chuah et al [15] Xu et al [16] and Wang et al[17] proposed fuzzy keyword search over encryptedcloud data aiming at tolerance of both minor typosand format inconsistencies for usersrsquo search input [19]further proposed privacy-assured similarity searchmechanisms over outsourced cloud data In [20] weproposed a secure efficient and distributed keywordsearch protocol in the geo-distributed cloud environ-ment

The system model of these previous works onlyconsider one data owner which implies that in theirsolutions the data owner and data users can easilycommunicate and exchange secret information Whennumerous data owners are involved in the systemsecret information exchanging will cause considerablecommunication overhead Sun et al [21] and Zheng

et al [22] proposed secure attribute-based keywordsearch schemes in the challenging scenario wheremultiple owners are involved However applying CP-ABE in the cloud system would introduce problemsfor data user revocation ie the cloud has to updatethe large amount of data stored on it for a data userrevocation [32] Additionally they do not supportprivacy preserving ranked multi-keyword search Ourpaper differs from previous studies regarding the em-phasis of multiple data owners in the system modelThis paper seeks a solution scheme to maximally relaxthe requirements for data owners and users so that

the scheme could be suitable for a large number of cloud computing users

93 Order Preserving Encryption

The order preserving encryption is used to preventthe cloud server from knowing the exact relevancescores of keywords to a data file The early work of Agrawal et al proposed an Order Preserving sym-metric Encryption (OPE) scheme where the numericalorder of plain texts are preserved [33] Boldyreva etal further introduced a modular order preserving

encryption in [34] Yi et al [35] proposed an order p-reserving function to encode data in sensor networksPopa et al [36] recently proposed an ideal-secureorder-preserving encryption scheme Kerschbaum et

al [37] further proposed a scheme which is not onlyidea-secure but is also an efficient order-preservingencryption scheme However these schemes are notadditive order preserving As a complementary workto the previous order preserving work we propose anew additive order and privacy preserving functions(AOPPF) Data owners can freely choose any functionfrom an AOPPF family to encode their relevancescores The cloud server computes the sum of encodedrelevance scores and ranks them based on the sum

10 CONCLUSIONS

In this paper we explore the problem of securemulti-keyword search for multiple data owners andmultiple data users in the cloud computing envi-ronment Different from prior works our schemesenable authenticated data users to achieve secureconvenient and efficient searches over multiple dataownersrsquo data To efficiently authenticate data usersand detect attackers who steal the secret key andperform illegal searches we propose a novel dynamicsecret key generation protocol and a new data userauthentication protocol To enable the cloud server toperform secure search among multiple ownersrsquo dataencrypted with different secret keys we systematical-ly construct a novel secure search protocol To rank thesearch results and preserve the privacy of relevancescores between keywords and files we propose anovel Additive Order and Privacy Preserving Func-tion family Moreover we show that our approachis computationally efficient even for large data andkeyword sets As our future work on one hand wewill consider the problem of secure fuzzy keywordsearch in a multi-owner paradigm On the other handwe plan to implement our scheme on the commercialclouds

ACKNOWLEDGMENTS

This work is supported in part by the National Natu-ral Science Foundation of China (Project No 6117303861472125 61300217) and the China Scholarship Coun-cil A preliminary version of this paper appearedin the Proceedings of the IEEEIFIP InternationalConference on Dependable Systems and Networks(DSNrsquo14) [27]

REFERENCES

[1] M Armbrust A Fox R Griffith A D Joseph R KatzA Konwinski G Lee D Patterson A Rabkin I Stoica andM Zaharia ldquoA view of cloud computingrdquo Communication of the ACM vol 53 no 4 pp 50ndash58 2010

[2] C Wang S S Chow Q Wang K Ren and W Lou ldquoPrivacy-preserving public auditing for secure cloud storagerdquo Comput-

ers IEEE Transactions on vol 62 no 2 pp 362ndash375 2013[3] DSong DWagner and APerrig ldquoPractical techniques for

searches on encrypted datardquo in Proc IEEE International Sym- posium on Security and Privacy (SampPrsquo00) Nagoya Japan Jan2000 pp 44ndash55

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 13140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 13

[4] E Goh (2003) Secure indexes [Online] Availablehttpeprintiacrorg

[5] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proc ACM CCSrsquo06 VA USA Oct 2006 pp79ndash88

[6] D B et al ldquoPublic key encryption with keyword search secureagainst keyword guessing attacks without random oraclerdquoEUROCRYPT vol 43 pp 506ndash522 2004

[7] P Golle J Staddon and B Waters ldquoSecure conjunctive key-word search over encrypted datardquo in Proc Applied Cryptogra-

phy and Network Security (ACNSrsquo04) Yellow Mountain China Jun 2004 pp 31ndash45

[8] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo in ProcInformation and Communications Security (ICICSrsquo05) BeijingChina Dec 2005 pp 414ndash426

[9] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo in Proc IEEEDistributed Computing Systems (ICDCSrsquo10) Genoa Italy Jun2010 pp 253ndash262

[10] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo in Proc IEEE INFOCOMrsquo11 Shanghai China Apr 2011pp 829ndash837

[11] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo Parallel and Distributed Systems IEEE Transactions onvol 25 no 1 pp 222ndash233 2014

[12] W Sun B Wang N Cao M Li W Lou Y T Hou and H LildquoVerifiable privacy-preserving multi-keyword text search inthe cloud supporting similarity-based rankingrdquo Parallel andDistributed Systems IEEE Transactions on vol 25 no 11 pp3025ndash3035 2014

[13] Z Xu W Kang R Li K Yow and C Xu ldquoEfficient multi-keyword ranked query on encrypted data in the cloudrdquo inProc IEEE Parallel and Distributed Systems (ICPADSrsquo12) Singa-pore Dec 2012 pp 244ndash251

[14] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo in

Proc IEEE INFOCOMrsquo10 San Diego CA Mar 2010 pp 1ndash5[15] M Chuah and W Hu ldquoPrivacy-aware bedtree based solutionfor fuzzy multi-keyword search over encrypted datardquo in ProcIEEE 31th International Conference on Distributed ComputingSystems (ICDCSrsquo11) Minneapolis MN Jun 2011 pp 383ndash392

[16] P Xu H Jin Q Wu and W Wang ldquoPublic-key encryptionwith fuzzy keyword search A provably secure scheme underkeyword guessing attackrdquo Computers IEEE Transactions onvol 62 no 11 pp 2266ndash2277 2013

[17] B Wang S Yu W Lou and Y T Hou ldquoPrivacy-preservingmulti-keyword fuzzy search over encrypted data in thecloudrdquo in IEEE INFOCOM Toronto Canada May 2014 pp2112ndash2120

[18] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoParallel and Distributed Systems IEEE Transactions on vol 23

no 8 pp 1467ndash1479 2012[19] C Wang K Ren S Yu and K M R Urs ldquoAchieving usableand privacy-assured similarity search over outsourced clouddatardquo in Proc IEEE INFOCOMrsquo12 Orlando FL Mar 2012 pp451ndash459

[20] W Zhang Y Lin S Xiao Q Liu and T Zhou ldquoSecuredistributed keyword search in multiple cloudsrdquo in ProcIEEEACM IWQOSrsquo14 Hongkong May 2014 pp 370ndash379

[21] W Sun S Yu W Lou Y T Hou and H Li ldquoProtectingyour right Attribute-based keyword search with fine-grainedowner-enforced search authorization in the cloudrdquo in ProcIEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 226ndash234

[22] Q Zheng S Xu and G Ateniese ldquoVabks Verifiable attribute- based keyword search over outsourced encrypted datardquo inProc IEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 522ndash530

[23] T Jung X Y Li Z Wan and M Wan ldquoPrivacy preservingcloud data access with multi-authoritiesrdquo in Proc IEEE INFO-COMrsquo13 Turin Italy Apr 2013 pp 2625ndash2633

[24] Q Liu C C Tan J Wu and G Wang ldquoEfficient informationretrieval for ranked queries in cost-effective cloud environ-

mentsrdquo in INFOCOM 2012 Proceedings IEEE IEEE 2012 pp2581ndash2585

[25] B Chor E Kushilevitz O Goldreich and M Sudan ldquoPrivateinformation retrievalrdquo Journal of the ACM vol 45 no 6 pp965ndash981 1998

[26] I H Witten A Moffat and T C Bell Managing gigabytesCompressing and indexing documents and images San FranciscoUSA Morgan Kaufmann 1999

[27] W Zhang S Xiao Y Lin T Zhou and S Zhou ldquoSecure ranked

multi-keyword search for multiple data owners in cloud com-putingrdquo in Proc 44th Annual IEEEIFIP International Conferenceon Dependable Systems and Networks (DSN2014) Atlanta USAIEEE jun 2014 pp 276ndash286

[28] IETF ldquoRequest for comments databaserdquo [Online] Availablehttpwwwietforgrfchtml

[29] H Systems ldquoHermetic word frequency counterrdquo [Online]Available httpwwwhermeticchwfcwfchtm

[30] D Boneh and M Franklin ldquoIdentity-based encryption fromthe weil pairingrdquo SIAM Journal on Computing vol 32 no 3pp 586ndash615 2003

[31] W Sun B Wang N Cao M Li W Lou Y T Hou andH Li ldquoPrivacy-preserving multi-keyword text search in thecloud supporting similarity-based rankingrdquo in Proc IEEE

ASIACCSrsquo13 Hangzhou China May 2013 pp 71ndash81[32] J Hur ldquoImproving security and efficiency in attribute-based

data sharingrdquo IEEE TRANSACTIONS ON KNOWLEDGE ANDDATA ENGINEERING vol 25 no 10 pp 2271ndash2282 2013

[33] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preserv-ing encryption for numeric datardquo in Proc ACM SIGMODrsquo04Paris France Jun 2004 pp 563ndash574

[34] A Boldyreva N Chenette Y Lee and A O ldquoOrder-preserving encryption revisited Improved security analysisand alternative solutionsrdquo in Proc Advances in Cryptology(CRYPTOrsquo11) California USA Aug 2011 pp 578ndash595

[35] Y Yi R Li F Chen A X Liu and Y Lin ldquoA digitalwatermarking approach to secure and precise range queryprocessing in sensor networksrdquo in Proc IEEE INFOCOMrsquo13Turin Italy Apr 2013 pp 1950ndash1958

[36] R A Popa F H Li and N Zeldovich ldquoAn ideal-security pro-tocol for order-preserving encodingrdquo in Security and Privacy

(SP) 2013 IEEE Symposium on IEEE 2013 pp 463ndash477[37] F Kerschbaum and A Schroepfer ldquoOptimal average-complexity ideal-security order-preserving encryptionrdquo inProceedings of the 2014 ACM SIGSAC Conference on Computerand Communications Security ACM 2014 pp 275ndash286

Wei Zhang received his BS degree in Com-puter Science from Hunan University Chinain 2011 Since 2011 He has been a PhDcandidate in College of Computer Scienceand Electronic Engineering Hunan Universi-ty His research interests include cloud com-puting network security and data mining

Yaping Lin received his BS degree in Com-puter Application from Hunan University Chi-na in 1982 and his MS degree in Com-puter Application from National University ofDefense Technology China in 1985 He re-ceived his PhD degree in Control Theoryand Application from Hunan University in2000 He has been a professor and PhD

supervisor in Hunan University since 1996From 2004-2005 he worked as a visitingresearcher at the University of Texas at Ar-

lington His research interests include machine learning networksecurity and wireless sensor networks

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 1414

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 14

Sheng Xiao received his BS degree in Ts-inghua University China in 2002 and hisMS degree from the National SingaporeUniversityMIT (Singapore-MIT Alliance) in2003 He received his PhD degree fromthe University of Massachusetts Amherst in2013 He has been an associate professor atthe College of Computer Science and Elec-tronic Engineering Hunan University He re-

ceived INFOCOM Best Paper in 2010 His re-search interests include communication se-

curity cloud computing and big data

Jie Wu is the chair and a Laura H Carnellprofessor with the Department of Computerand Information Sciences Temple Universi-ty Philadelphia PA Prior to joining TempleUniversity he was a program director withthe National Science Foundation and a dis-tinguished professor with Florida Atlantic U-niversity Boca Raton FL He regularly pub-lishes in scholarly journals conference pro-ceedings and books His research interestsinclude mobile computing and wireless net-

works routing protocols cloud and green computing network trustand security and social network applications He serves on severaleditorial boards including IEEE Transactions on Computers IEEETransactions on Service Computing and Journal of Parallel andDistributed Computing He was general cochairchair for IEEE MASS2006 IEEE IPDPS 2008 and IEEE ICDCS 2013 as well as programcochair for IEEE INFOCOM 2011 and CCF CNCC 2013 he servedas a general chair forACM MobiHoc 2014 He was an IEEE Com-puter Society distinguished visitor ACM distinguished speaker andchair for the IEEE Technical Committee on Distributed Processing(TCDP) He is a CCF distinguished speaker and a fellow of the IEEEHe is the recipient of the 2011 China Computer Federation (CCF)Overseas Outstanding Achievement Award

Siwang Zhou received his BS degree inApplied Physics from Fudan University in1995 He received his PhD degree in com-puter application technology from Hunan U-niversity in 2007 He has been an associateprofessor at the College of Computer Sci-ence and Electronic Engineering Hunan U-niversity His research interests include wire-less sensor networks and wavelet compres-

sive sensing

Page 11: Privacy Preserving Ranked Multi-Keyword

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 11140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 11

T

i m e c o s t o f r e - e n c r y p t i o n ( s )

Average number of keywords per owner

(a) Time cost of re-encryptingkeywords

Average number of trapdoors per user

T i m e c o s t o f r e - e n c r y p t i o n ( s )

(b) Time cost of re-encryptingtrapdoors

Fig 8 Time cost of the administration server

Number of queried keywords

T i m e o f q u e r y ( s )

(a) For different size of queriedkeywords with the same sizeof data set n=2000 and key-word dictionary u = 100

Number of files in the dataset (times103)

T i m e o f q u e r y ( s )

(b) For different size of data setwith the same size of queriedkeywords q=5 and keyworddictionary u = 100

Number of keywords in dictionary (times102

T i m e o f q u e r y ( s )

(c) For different size of key-word dictionary with the samesize of queried keywords q=5and data set n = 2000

Fig 9 Time cost of search

823 Re-encryption by the administration server

Fig 8(a) illustrates the re-encryption time cost of the

administration server in PRMSM As we can see forthe same average number of keywords per owner themore data owners are involved the more time is spenton re-encryption When there are 300 data ownerseach data owner has 100 keywords we need 38s tore-encrypt these keywords which is acceptable

Fig 8(b) demonstrates the time cost of re-encryptingtrapdoors We observe that for the same averagenumber of trapdoors per user the more data usersthat submit trapdoors the more time would be spenton re-encryption When there are 1000 data userswho concurrently submit data each data user has 10

trapdoors we only need 334s for re-encryption

824 Search

From Fig 9 we observe that PRMSM spends moretime for searching The fundamental reason is thatthe pairing operation used in PRMSM needs moretime As we can see from Fig 9(a) and Fig 9(c)the more keywords existing in the cloud server themore time is required for pairing operation Fig 9(b)confirms that when the number of keywords stored onthe cloud server remains a constant PRMSM will not

increase even if the number of files increases ThoughPRMSM spends relatively more time this observationalso confirms that the searching operation should beoutsourced to the cloud server

9 RELATED WOR K

In this section we review three categories of worksearchable encryption secure keyword search in cloudcomputing and order preserving encryption

91 Searchable Encryption

The earliest attempt of searchable encryption wasmade by Song et al In [3] they propose to encrypteach word in a file independently and allow the serverto find whether a single queried keyword is containedin the file without knowing the exact word Thisproposal is more of theoretic interests because of highcomputational costs Goh et al propose building akeyword index for each file and using Bloom filter

to accelerate the search [4] Curtmola et al propose building indices for each keyword and use hashtables as an alternative approach to searchable en-cryption [5] The first public key scheme for keywordsearch over encrypted data is presented in [6] [7] and[8] further enrich the search functionalities of search-able encryption by proposing schemes for conjunctivekeyword search

The searchable encryption cares mostly about singlekeyword search or boolean keyword search Extend-ing these techniques for ranked multi-keyword searchwill incur heavy computation and storage costs

92 Secure Keyword Search in Cloud Computing

The privacy concerns in cloud computing motivatethe study on secure keyword search Wang et al

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 12140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 12

first defined and solved the secure ranked keywordsearch over encrypted cloud data In [9] and [18] theyproposed a scheme that returns the top-k relevant filesupon a single keyword search Cao et al [10] [11]and Sun et al [31] [12] extended the secure keywordsearch for multi-keyword queries Their approachesvectorize the list of keywords and apply matrix mul-tiplications to hide the actual keyword informationfrom the cloud server while still allowing the serverto find out the top-k relevant data files Xu et alproposed MKQE (Multi-Keyword ranked Query onEncrypted data) that enables a dynamic keyword dic-tionary and avoids the ranking order being distorted by several high frequency keywords [13] Li et al[14] Chuah et al [15] Xu et al [16] and Wang et al[17] proposed fuzzy keyword search over encryptedcloud data aiming at tolerance of both minor typosand format inconsistencies for usersrsquo search input [19]further proposed privacy-assured similarity searchmechanisms over outsourced cloud data In [20] weproposed a secure efficient and distributed keywordsearch protocol in the geo-distributed cloud environ-ment

The system model of these previous works onlyconsider one data owner which implies that in theirsolutions the data owner and data users can easilycommunicate and exchange secret information Whennumerous data owners are involved in the systemsecret information exchanging will cause considerablecommunication overhead Sun et al [21] and Zheng

et al [22] proposed secure attribute-based keywordsearch schemes in the challenging scenario wheremultiple owners are involved However applying CP-ABE in the cloud system would introduce problemsfor data user revocation ie the cloud has to updatethe large amount of data stored on it for a data userrevocation [32] Additionally they do not supportprivacy preserving ranked multi-keyword search Ourpaper differs from previous studies regarding the em-phasis of multiple data owners in the system modelThis paper seeks a solution scheme to maximally relaxthe requirements for data owners and users so that

the scheme could be suitable for a large number of cloud computing users

93 Order Preserving Encryption

The order preserving encryption is used to preventthe cloud server from knowing the exact relevancescores of keywords to a data file The early work of Agrawal et al proposed an Order Preserving sym-metric Encryption (OPE) scheme where the numericalorder of plain texts are preserved [33] Boldyreva etal further introduced a modular order preserving

encryption in [34] Yi et al [35] proposed an order p-reserving function to encode data in sensor networksPopa et al [36] recently proposed an ideal-secureorder-preserving encryption scheme Kerschbaum et

al [37] further proposed a scheme which is not onlyidea-secure but is also an efficient order-preservingencryption scheme However these schemes are notadditive order preserving As a complementary workto the previous order preserving work we propose anew additive order and privacy preserving functions(AOPPF) Data owners can freely choose any functionfrom an AOPPF family to encode their relevancescores The cloud server computes the sum of encodedrelevance scores and ranks them based on the sum

10 CONCLUSIONS

In this paper we explore the problem of securemulti-keyword search for multiple data owners andmultiple data users in the cloud computing envi-ronment Different from prior works our schemesenable authenticated data users to achieve secureconvenient and efficient searches over multiple dataownersrsquo data To efficiently authenticate data usersand detect attackers who steal the secret key andperform illegal searches we propose a novel dynamicsecret key generation protocol and a new data userauthentication protocol To enable the cloud server toperform secure search among multiple ownersrsquo dataencrypted with different secret keys we systematical-ly construct a novel secure search protocol To rank thesearch results and preserve the privacy of relevancescores between keywords and files we propose anovel Additive Order and Privacy Preserving Func-tion family Moreover we show that our approachis computationally efficient even for large data andkeyword sets As our future work on one hand wewill consider the problem of secure fuzzy keywordsearch in a multi-owner paradigm On the other handwe plan to implement our scheme on the commercialclouds

ACKNOWLEDGMENTS

This work is supported in part by the National Natu-ral Science Foundation of China (Project No 6117303861472125 61300217) and the China Scholarship Coun-cil A preliminary version of this paper appearedin the Proceedings of the IEEEIFIP InternationalConference on Dependable Systems and Networks(DSNrsquo14) [27]

REFERENCES

[1] M Armbrust A Fox R Griffith A D Joseph R KatzA Konwinski G Lee D Patterson A Rabkin I Stoica andM Zaharia ldquoA view of cloud computingrdquo Communication of the ACM vol 53 no 4 pp 50ndash58 2010

[2] C Wang S S Chow Q Wang K Ren and W Lou ldquoPrivacy-preserving public auditing for secure cloud storagerdquo Comput-

ers IEEE Transactions on vol 62 no 2 pp 362ndash375 2013[3] DSong DWagner and APerrig ldquoPractical techniques for

searches on encrypted datardquo in Proc IEEE International Sym- posium on Security and Privacy (SampPrsquo00) Nagoya Japan Jan2000 pp 44ndash55

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 13140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 13

[4] E Goh (2003) Secure indexes [Online] Availablehttpeprintiacrorg

[5] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proc ACM CCSrsquo06 VA USA Oct 2006 pp79ndash88

[6] D B et al ldquoPublic key encryption with keyword search secureagainst keyword guessing attacks without random oraclerdquoEUROCRYPT vol 43 pp 506ndash522 2004

[7] P Golle J Staddon and B Waters ldquoSecure conjunctive key-word search over encrypted datardquo in Proc Applied Cryptogra-

phy and Network Security (ACNSrsquo04) Yellow Mountain China Jun 2004 pp 31ndash45

[8] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo in ProcInformation and Communications Security (ICICSrsquo05) BeijingChina Dec 2005 pp 414ndash426

[9] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo in Proc IEEEDistributed Computing Systems (ICDCSrsquo10) Genoa Italy Jun2010 pp 253ndash262

[10] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo in Proc IEEE INFOCOMrsquo11 Shanghai China Apr 2011pp 829ndash837

[11] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo Parallel and Distributed Systems IEEE Transactions onvol 25 no 1 pp 222ndash233 2014

[12] W Sun B Wang N Cao M Li W Lou Y T Hou and H LildquoVerifiable privacy-preserving multi-keyword text search inthe cloud supporting similarity-based rankingrdquo Parallel andDistributed Systems IEEE Transactions on vol 25 no 11 pp3025ndash3035 2014

[13] Z Xu W Kang R Li K Yow and C Xu ldquoEfficient multi-keyword ranked query on encrypted data in the cloudrdquo inProc IEEE Parallel and Distributed Systems (ICPADSrsquo12) Singa-pore Dec 2012 pp 244ndash251

[14] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo in

Proc IEEE INFOCOMrsquo10 San Diego CA Mar 2010 pp 1ndash5[15] M Chuah and W Hu ldquoPrivacy-aware bedtree based solutionfor fuzzy multi-keyword search over encrypted datardquo in ProcIEEE 31th International Conference on Distributed ComputingSystems (ICDCSrsquo11) Minneapolis MN Jun 2011 pp 383ndash392

[16] P Xu H Jin Q Wu and W Wang ldquoPublic-key encryptionwith fuzzy keyword search A provably secure scheme underkeyword guessing attackrdquo Computers IEEE Transactions onvol 62 no 11 pp 2266ndash2277 2013

[17] B Wang S Yu W Lou and Y T Hou ldquoPrivacy-preservingmulti-keyword fuzzy search over encrypted data in thecloudrdquo in IEEE INFOCOM Toronto Canada May 2014 pp2112ndash2120

[18] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoParallel and Distributed Systems IEEE Transactions on vol 23

no 8 pp 1467ndash1479 2012[19] C Wang K Ren S Yu and K M R Urs ldquoAchieving usableand privacy-assured similarity search over outsourced clouddatardquo in Proc IEEE INFOCOMrsquo12 Orlando FL Mar 2012 pp451ndash459

[20] W Zhang Y Lin S Xiao Q Liu and T Zhou ldquoSecuredistributed keyword search in multiple cloudsrdquo in ProcIEEEACM IWQOSrsquo14 Hongkong May 2014 pp 370ndash379

[21] W Sun S Yu W Lou Y T Hou and H Li ldquoProtectingyour right Attribute-based keyword search with fine-grainedowner-enforced search authorization in the cloudrdquo in ProcIEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 226ndash234

[22] Q Zheng S Xu and G Ateniese ldquoVabks Verifiable attribute- based keyword search over outsourced encrypted datardquo inProc IEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 522ndash530

[23] T Jung X Y Li Z Wan and M Wan ldquoPrivacy preservingcloud data access with multi-authoritiesrdquo in Proc IEEE INFO-COMrsquo13 Turin Italy Apr 2013 pp 2625ndash2633

[24] Q Liu C C Tan J Wu and G Wang ldquoEfficient informationretrieval for ranked queries in cost-effective cloud environ-

mentsrdquo in INFOCOM 2012 Proceedings IEEE IEEE 2012 pp2581ndash2585

[25] B Chor E Kushilevitz O Goldreich and M Sudan ldquoPrivateinformation retrievalrdquo Journal of the ACM vol 45 no 6 pp965ndash981 1998

[26] I H Witten A Moffat and T C Bell Managing gigabytesCompressing and indexing documents and images San FranciscoUSA Morgan Kaufmann 1999

[27] W Zhang S Xiao Y Lin T Zhou and S Zhou ldquoSecure ranked

multi-keyword search for multiple data owners in cloud com-putingrdquo in Proc 44th Annual IEEEIFIP International Conferenceon Dependable Systems and Networks (DSN2014) Atlanta USAIEEE jun 2014 pp 276ndash286

[28] IETF ldquoRequest for comments databaserdquo [Online] Availablehttpwwwietforgrfchtml

[29] H Systems ldquoHermetic word frequency counterrdquo [Online]Available httpwwwhermeticchwfcwfchtm

[30] D Boneh and M Franklin ldquoIdentity-based encryption fromthe weil pairingrdquo SIAM Journal on Computing vol 32 no 3pp 586ndash615 2003

[31] W Sun B Wang N Cao M Li W Lou Y T Hou andH Li ldquoPrivacy-preserving multi-keyword text search in thecloud supporting similarity-based rankingrdquo in Proc IEEE

ASIACCSrsquo13 Hangzhou China May 2013 pp 71ndash81[32] J Hur ldquoImproving security and efficiency in attribute-based

data sharingrdquo IEEE TRANSACTIONS ON KNOWLEDGE ANDDATA ENGINEERING vol 25 no 10 pp 2271ndash2282 2013

[33] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preserv-ing encryption for numeric datardquo in Proc ACM SIGMODrsquo04Paris France Jun 2004 pp 563ndash574

[34] A Boldyreva N Chenette Y Lee and A O ldquoOrder-preserving encryption revisited Improved security analysisand alternative solutionsrdquo in Proc Advances in Cryptology(CRYPTOrsquo11) California USA Aug 2011 pp 578ndash595

[35] Y Yi R Li F Chen A X Liu and Y Lin ldquoA digitalwatermarking approach to secure and precise range queryprocessing in sensor networksrdquo in Proc IEEE INFOCOMrsquo13Turin Italy Apr 2013 pp 1950ndash1958

[36] R A Popa F H Li and N Zeldovich ldquoAn ideal-security pro-tocol for order-preserving encodingrdquo in Security and Privacy

(SP) 2013 IEEE Symposium on IEEE 2013 pp 463ndash477[37] F Kerschbaum and A Schroepfer ldquoOptimal average-complexity ideal-security order-preserving encryptionrdquo inProceedings of the 2014 ACM SIGSAC Conference on Computerand Communications Security ACM 2014 pp 275ndash286

Wei Zhang received his BS degree in Com-puter Science from Hunan University Chinain 2011 Since 2011 He has been a PhDcandidate in College of Computer Scienceand Electronic Engineering Hunan Universi-ty His research interests include cloud com-puting network security and data mining

Yaping Lin received his BS degree in Com-puter Application from Hunan University Chi-na in 1982 and his MS degree in Com-puter Application from National University ofDefense Technology China in 1985 He re-ceived his PhD degree in Control Theoryand Application from Hunan University in2000 He has been a professor and PhD

supervisor in Hunan University since 1996From 2004-2005 he worked as a visitingresearcher at the University of Texas at Ar-

lington His research interests include machine learning networksecurity and wireless sensor networks

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 1414

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 14

Sheng Xiao received his BS degree in Ts-inghua University China in 2002 and hisMS degree from the National SingaporeUniversityMIT (Singapore-MIT Alliance) in2003 He received his PhD degree fromthe University of Massachusetts Amherst in2013 He has been an associate professor atthe College of Computer Science and Elec-tronic Engineering Hunan University He re-

ceived INFOCOM Best Paper in 2010 His re-search interests include communication se-

curity cloud computing and big data

Jie Wu is the chair and a Laura H Carnellprofessor with the Department of Computerand Information Sciences Temple Universi-ty Philadelphia PA Prior to joining TempleUniversity he was a program director withthe National Science Foundation and a dis-tinguished professor with Florida Atlantic U-niversity Boca Raton FL He regularly pub-lishes in scholarly journals conference pro-ceedings and books His research interestsinclude mobile computing and wireless net-

works routing protocols cloud and green computing network trustand security and social network applications He serves on severaleditorial boards including IEEE Transactions on Computers IEEETransactions on Service Computing and Journal of Parallel andDistributed Computing He was general cochairchair for IEEE MASS2006 IEEE IPDPS 2008 and IEEE ICDCS 2013 as well as programcochair for IEEE INFOCOM 2011 and CCF CNCC 2013 he servedas a general chair forACM MobiHoc 2014 He was an IEEE Com-puter Society distinguished visitor ACM distinguished speaker andchair for the IEEE Technical Committee on Distributed Processing(TCDP) He is a CCF distinguished speaker and a fellow of the IEEEHe is the recipient of the 2011 China Computer Federation (CCF)Overseas Outstanding Achievement Award

Siwang Zhou received his BS degree inApplied Physics from Fudan University in1995 He received his PhD degree in com-puter application technology from Hunan U-niversity in 2007 He has been an associateprofessor at the College of Computer Sci-ence and Electronic Engineering Hunan U-niversity His research interests include wire-less sensor networks and wavelet compres-

sive sensing

Page 12: Privacy Preserving Ranked Multi-Keyword

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 12140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 12

first defined and solved the secure ranked keywordsearch over encrypted cloud data In [9] and [18] theyproposed a scheme that returns the top-k relevant filesupon a single keyword search Cao et al [10] [11]and Sun et al [31] [12] extended the secure keywordsearch for multi-keyword queries Their approachesvectorize the list of keywords and apply matrix mul-tiplications to hide the actual keyword informationfrom the cloud server while still allowing the serverto find out the top-k relevant data files Xu et alproposed MKQE (Multi-Keyword ranked Query onEncrypted data) that enables a dynamic keyword dic-tionary and avoids the ranking order being distorted by several high frequency keywords [13] Li et al[14] Chuah et al [15] Xu et al [16] and Wang et al[17] proposed fuzzy keyword search over encryptedcloud data aiming at tolerance of both minor typosand format inconsistencies for usersrsquo search input [19]further proposed privacy-assured similarity searchmechanisms over outsourced cloud data In [20] weproposed a secure efficient and distributed keywordsearch protocol in the geo-distributed cloud environ-ment

The system model of these previous works onlyconsider one data owner which implies that in theirsolutions the data owner and data users can easilycommunicate and exchange secret information Whennumerous data owners are involved in the systemsecret information exchanging will cause considerablecommunication overhead Sun et al [21] and Zheng

et al [22] proposed secure attribute-based keywordsearch schemes in the challenging scenario wheremultiple owners are involved However applying CP-ABE in the cloud system would introduce problemsfor data user revocation ie the cloud has to updatethe large amount of data stored on it for a data userrevocation [32] Additionally they do not supportprivacy preserving ranked multi-keyword search Ourpaper differs from previous studies regarding the em-phasis of multiple data owners in the system modelThis paper seeks a solution scheme to maximally relaxthe requirements for data owners and users so that

the scheme could be suitable for a large number of cloud computing users

93 Order Preserving Encryption

The order preserving encryption is used to preventthe cloud server from knowing the exact relevancescores of keywords to a data file The early work of Agrawal et al proposed an Order Preserving sym-metric Encryption (OPE) scheme where the numericalorder of plain texts are preserved [33] Boldyreva etal further introduced a modular order preserving

encryption in [34] Yi et al [35] proposed an order p-reserving function to encode data in sensor networksPopa et al [36] recently proposed an ideal-secureorder-preserving encryption scheme Kerschbaum et

al [37] further proposed a scheme which is not onlyidea-secure but is also an efficient order-preservingencryption scheme However these schemes are notadditive order preserving As a complementary workto the previous order preserving work we propose anew additive order and privacy preserving functions(AOPPF) Data owners can freely choose any functionfrom an AOPPF family to encode their relevancescores The cloud server computes the sum of encodedrelevance scores and ranks them based on the sum

10 CONCLUSIONS

In this paper we explore the problem of securemulti-keyword search for multiple data owners andmultiple data users in the cloud computing envi-ronment Different from prior works our schemesenable authenticated data users to achieve secureconvenient and efficient searches over multiple dataownersrsquo data To efficiently authenticate data usersand detect attackers who steal the secret key andperform illegal searches we propose a novel dynamicsecret key generation protocol and a new data userauthentication protocol To enable the cloud server toperform secure search among multiple ownersrsquo dataencrypted with different secret keys we systematical-ly construct a novel secure search protocol To rank thesearch results and preserve the privacy of relevancescores between keywords and files we propose anovel Additive Order and Privacy Preserving Func-tion family Moreover we show that our approachis computationally efficient even for large data andkeyword sets As our future work on one hand wewill consider the problem of secure fuzzy keywordsearch in a multi-owner paradigm On the other handwe plan to implement our scheme on the commercialclouds

ACKNOWLEDGMENTS

This work is supported in part by the National Natu-ral Science Foundation of China (Project No 6117303861472125 61300217) and the China Scholarship Coun-cil A preliminary version of this paper appearedin the Proceedings of the IEEEIFIP InternationalConference on Dependable Systems and Networks(DSNrsquo14) [27]

REFERENCES

[1] M Armbrust A Fox R Griffith A D Joseph R KatzA Konwinski G Lee D Patterson A Rabkin I Stoica andM Zaharia ldquoA view of cloud computingrdquo Communication of the ACM vol 53 no 4 pp 50ndash58 2010

[2] C Wang S S Chow Q Wang K Ren and W Lou ldquoPrivacy-preserving public auditing for secure cloud storagerdquo Comput-

ers IEEE Transactions on vol 62 no 2 pp 362ndash375 2013[3] DSong DWagner and APerrig ldquoPractical techniques for

searches on encrypted datardquo in Proc IEEE International Sym- posium on Security and Privacy (SampPrsquo00) Nagoya Japan Jan2000 pp 44ndash55

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 13140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 13

[4] E Goh (2003) Secure indexes [Online] Availablehttpeprintiacrorg

[5] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proc ACM CCSrsquo06 VA USA Oct 2006 pp79ndash88

[6] D B et al ldquoPublic key encryption with keyword search secureagainst keyword guessing attacks without random oraclerdquoEUROCRYPT vol 43 pp 506ndash522 2004

[7] P Golle J Staddon and B Waters ldquoSecure conjunctive key-word search over encrypted datardquo in Proc Applied Cryptogra-

phy and Network Security (ACNSrsquo04) Yellow Mountain China Jun 2004 pp 31ndash45

[8] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo in ProcInformation and Communications Security (ICICSrsquo05) BeijingChina Dec 2005 pp 414ndash426

[9] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo in Proc IEEEDistributed Computing Systems (ICDCSrsquo10) Genoa Italy Jun2010 pp 253ndash262

[10] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo in Proc IEEE INFOCOMrsquo11 Shanghai China Apr 2011pp 829ndash837

[11] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo Parallel and Distributed Systems IEEE Transactions onvol 25 no 1 pp 222ndash233 2014

[12] W Sun B Wang N Cao M Li W Lou Y T Hou and H LildquoVerifiable privacy-preserving multi-keyword text search inthe cloud supporting similarity-based rankingrdquo Parallel andDistributed Systems IEEE Transactions on vol 25 no 11 pp3025ndash3035 2014

[13] Z Xu W Kang R Li K Yow and C Xu ldquoEfficient multi-keyword ranked query on encrypted data in the cloudrdquo inProc IEEE Parallel and Distributed Systems (ICPADSrsquo12) Singa-pore Dec 2012 pp 244ndash251

[14] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo in

Proc IEEE INFOCOMrsquo10 San Diego CA Mar 2010 pp 1ndash5[15] M Chuah and W Hu ldquoPrivacy-aware bedtree based solutionfor fuzzy multi-keyword search over encrypted datardquo in ProcIEEE 31th International Conference on Distributed ComputingSystems (ICDCSrsquo11) Minneapolis MN Jun 2011 pp 383ndash392

[16] P Xu H Jin Q Wu and W Wang ldquoPublic-key encryptionwith fuzzy keyword search A provably secure scheme underkeyword guessing attackrdquo Computers IEEE Transactions onvol 62 no 11 pp 2266ndash2277 2013

[17] B Wang S Yu W Lou and Y T Hou ldquoPrivacy-preservingmulti-keyword fuzzy search over encrypted data in thecloudrdquo in IEEE INFOCOM Toronto Canada May 2014 pp2112ndash2120

[18] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoParallel and Distributed Systems IEEE Transactions on vol 23

no 8 pp 1467ndash1479 2012[19] C Wang K Ren S Yu and K M R Urs ldquoAchieving usableand privacy-assured similarity search over outsourced clouddatardquo in Proc IEEE INFOCOMrsquo12 Orlando FL Mar 2012 pp451ndash459

[20] W Zhang Y Lin S Xiao Q Liu and T Zhou ldquoSecuredistributed keyword search in multiple cloudsrdquo in ProcIEEEACM IWQOSrsquo14 Hongkong May 2014 pp 370ndash379

[21] W Sun S Yu W Lou Y T Hou and H Li ldquoProtectingyour right Attribute-based keyword search with fine-grainedowner-enforced search authorization in the cloudrdquo in ProcIEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 226ndash234

[22] Q Zheng S Xu and G Ateniese ldquoVabks Verifiable attribute- based keyword search over outsourced encrypted datardquo inProc IEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 522ndash530

[23] T Jung X Y Li Z Wan and M Wan ldquoPrivacy preservingcloud data access with multi-authoritiesrdquo in Proc IEEE INFO-COMrsquo13 Turin Italy Apr 2013 pp 2625ndash2633

[24] Q Liu C C Tan J Wu and G Wang ldquoEfficient informationretrieval for ranked queries in cost-effective cloud environ-

mentsrdquo in INFOCOM 2012 Proceedings IEEE IEEE 2012 pp2581ndash2585

[25] B Chor E Kushilevitz O Goldreich and M Sudan ldquoPrivateinformation retrievalrdquo Journal of the ACM vol 45 no 6 pp965ndash981 1998

[26] I H Witten A Moffat and T C Bell Managing gigabytesCompressing and indexing documents and images San FranciscoUSA Morgan Kaufmann 1999

[27] W Zhang S Xiao Y Lin T Zhou and S Zhou ldquoSecure ranked

multi-keyword search for multiple data owners in cloud com-putingrdquo in Proc 44th Annual IEEEIFIP International Conferenceon Dependable Systems and Networks (DSN2014) Atlanta USAIEEE jun 2014 pp 276ndash286

[28] IETF ldquoRequest for comments databaserdquo [Online] Availablehttpwwwietforgrfchtml

[29] H Systems ldquoHermetic word frequency counterrdquo [Online]Available httpwwwhermeticchwfcwfchtm

[30] D Boneh and M Franklin ldquoIdentity-based encryption fromthe weil pairingrdquo SIAM Journal on Computing vol 32 no 3pp 586ndash615 2003

[31] W Sun B Wang N Cao M Li W Lou Y T Hou andH Li ldquoPrivacy-preserving multi-keyword text search in thecloud supporting similarity-based rankingrdquo in Proc IEEE

ASIACCSrsquo13 Hangzhou China May 2013 pp 71ndash81[32] J Hur ldquoImproving security and efficiency in attribute-based

data sharingrdquo IEEE TRANSACTIONS ON KNOWLEDGE ANDDATA ENGINEERING vol 25 no 10 pp 2271ndash2282 2013

[33] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preserv-ing encryption for numeric datardquo in Proc ACM SIGMODrsquo04Paris France Jun 2004 pp 563ndash574

[34] A Boldyreva N Chenette Y Lee and A O ldquoOrder-preserving encryption revisited Improved security analysisand alternative solutionsrdquo in Proc Advances in Cryptology(CRYPTOrsquo11) California USA Aug 2011 pp 578ndash595

[35] Y Yi R Li F Chen A X Liu and Y Lin ldquoA digitalwatermarking approach to secure and precise range queryprocessing in sensor networksrdquo in Proc IEEE INFOCOMrsquo13Turin Italy Apr 2013 pp 1950ndash1958

[36] R A Popa F H Li and N Zeldovich ldquoAn ideal-security pro-tocol for order-preserving encodingrdquo in Security and Privacy

(SP) 2013 IEEE Symposium on IEEE 2013 pp 463ndash477[37] F Kerschbaum and A Schroepfer ldquoOptimal average-complexity ideal-security order-preserving encryptionrdquo inProceedings of the 2014 ACM SIGSAC Conference on Computerand Communications Security ACM 2014 pp 275ndash286

Wei Zhang received his BS degree in Com-puter Science from Hunan University Chinain 2011 Since 2011 He has been a PhDcandidate in College of Computer Scienceand Electronic Engineering Hunan Universi-ty His research interests include cloud com-puting network security and data mining

Yaping Lin received his BS degree in Com-puter Application from Hunan University Chi-na in 1982 and his MS degree in Com-puter Application from National University ofDefense Technology China in 1985 He re-ceived his PhD degree in Control Theoryand Application from Hunan University in2000 He has been a professor and PhD

supervisor in Hunan University since 1996From 2004-2005 he worked as a visitingresearcher at the University of Texas at Ar-

lington His research interests include machine learning networksecurity and wireless sensor networks

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 1414

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 14

Sheng Xiao received his BS degree in Ts-inghua University China in 2002 and hisMS degree from the National SingaporeUniversityMIT (Singapore-MIT Alliance) in2003 He received his PhD degree fromthe University of Massachusetts Amherst in2013 He has been an associate professor atthe College of Computer Science and Elec-tronic Engineering Hunan University He re-

ceived INFOCOM Best Paper in 2010 His re-search interests include communication se-

curity cloud computing and big data

Jie Wu is the chair and a Laura H Carnellprofessor with the Department of Computerand Information Sciences Temple Universi-ty Philadelphia PA Prior to joining TempleUniversity he was a program director withthe National Science Foundation and a dis-tinguished professor with Florida Atlantic U-niversity Boca Raton FL He regularly pub-lishes in scholarly journals conference pro-ceedings and books His research interestsinclude mobile computing and wireless net-

works routing protocols cloud and green computing network trustand security and social network applications He serves on severaleditorial boards including IEEE Transactions on Computers IEEETransactions on Service Computing and Journal of Parallel andDistributed Computing He was general cochairchair for IEEE MASS2006 IEEE IPDPS 2008 and IEEE ICDCS 2013 as well as programcochair for IEEE INFOCOM 2011 and CCF CNCC 2013 he servedas a general chair forACM MobiHoc 2014 He was an IEEE Com-puter Society distinguished visitor ACM distinguished speaker andchair for the IEEE Technical Committee on Distributed Processing(TCDP) He is a CCF distinguished speaker and a fellow of the IEEEHe is the recipient of the 2011 China Computer Federation (CCF)Overseas Outstanding Achievement Award

Siwang Zhou received his BS degree inApplied Physics from Fudan University in1995 He received his PhD degree in com-puter application technology from Hunan U-niversity in 2007 He has been an associateprofessor at the College of Computer Sci-ence and Electronic Engineering Hunan U-niversity His research interests include wire-less sensor networks and wavelet compres-

sive sensing

Page 13: Privacy Preserving Ranked Multi-Keyword

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 13140018-9340 (c) 2015 IEEE Personal use is permitted but republicationredistribution requires IEEE permission See

httpwwwieeeorgpublications_standardspublicationsrightsindexhtml for more information

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 13

[4] E Goh (2003) Secure indexes [Online] Availablehttpeprintiacrorg

[5] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proc ACM CCSrsquo06 VA USA Oct 2006 pp79ndash88

[6] D B et al ldquoPublic key encryption with keyword search secureagainst keyword guessing attacks without random oraclerdquoEUROCRYPT vol 43 pp 506ndash522 2004

[7] P Golle J Staddon and B Waters ldquoSecure conjunctive key-word search over encrypted datardquo in Proc Applied Cryptogra-

phy and Network Security (ACNSrsquo04) Yellow Mountain China Jun 2004 pp 31ndash45

[8] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo in ProcInformation and Communications Security (ICICSrsquo05) BeijingChina Dec 2005 pp 414ndash426

[9] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo in Proc IEEEDistributed Computing Systems (ICDCSrsquo10) Genoa Italy Jun2010 pp 253ndash262

[10] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo in Proc IEEE INFOCOMrsquo11 Shanghai China Apr 2011pp 829ndash837

[11] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserving multi-keyword ranked search over encrypted clouddatardquo Parallel and Distributed Systems IEEE Transactions onvol 25 no 1 pp 222ndash233 2014

[12] W Sun B Wang N Cao M Li W Lou Y T Hou and H LildquoVerifiable privacy-preserving multi-keyword text search inthe cloud supporting similarity-based rankingrdquo Parallel andDistributed Systems IEEE Transactions on vol 25 no 11 pp3025ndash3035 2014

[13] Z Xu W Kang R Li K Yow and C Xu ldquoEfficient multi-keyword ranked query on encrypted data in the cloudrdquo inProc IEEE Parallel and Distributed Systems (ICPADSrsquo12) Singa-pore Dec 2012 pp 244ndash251

[14] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo in

Proc IEEE INFOCOMrsquo10 San Diego CA Mar 2010 pp 1ndash5[15] M Chuah and W Hu ldquoPrivacy-aware bedtree based solutionfor fuzzy multi-keyword search over encrypted datardquo in ProcIEEE 31th International Conference on Distributed ComputingSystems (ICDCSrsquo11) Minneapolis MN Jun 2011 pp 383ndash392

[16] P Xu H Jin Q Wu and W Wang ldquoPublic-key encryptionwith fuzzy keyword search A provably secure scheme underkeyword guessing attackrdquo Computers IEEE Transactions onvol 62 no 11 pp 2266ndash2277 2013

[17] B Wang S Yu W Lou and Y T Hou ldquoPrivacy-preservingmulti-keyword fuzzy search over encrypted data in thecloudrdquo in IEEE INFOCOM Toronto Canada May 2014 pp2112ndash2120

[18] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoParallel and Distributed Systems IEEE Transactions on vol 23

no 8 pp 1467ndash1479 2012[19] C Wang K Ren S Yu and K M R Urs ldquoAchieving usableand privacy-assured similarity search over outsourced clouddatardquo in Proc IEEE INFOCOMrsquo12 Orlando FL Mar 2012 pp451ndash459

[20] W Zhang Y Lin S Xiao Q Liu and T Zhou ldquoSecuredistributed keyword search in multiple cloudsrdquo in ProcIEEEACM IWQOSrsquo14 Hongkong May 2014 pp 370ndash379

[21] W Sun S Yu W Lou Y T Hou and H Li ldquoProtectingyour right Attribute-based keyword search with fine-grainedowner-enforced search authorization in the cloudrdquo in ProcIEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 226ndash234

[22] Q Zheng S Xu and G Ateniese ldquoVabks Verifiable attribute- based keyword search over outsourced encrypted datardquo inProc IEEE INFOCOMrsquo14 Toronto Canada May 2014 pp 522ndash530

[23] T Jung X Y Li Z Wan and M Wan ldquoPrivacy preservingcloud data access with multi-authoritiesrdquo in Proc IEEE INFO-COMrsquo13 Turin Italy Apr 2013 pp 2625ndash2633

[24] Q Liu C C Tan J Wu and G Wang ldquoEfficient informationretrieval for ranked queries in cost-effective cloud environ-

mentsrdquo in INFOCOM 2012 Proceedings IEEE IEEE 2012 pp2581ndash2585

[25] B Chor E Kushilevitz O Goldreich and M Sudan ldquoPrivateinformation retrievalrdquo Journal of the ACM vol 45 no 6 pp965ndash981 1998

[26] I H Witten A Moffat and T C Bell Managing gigabytesCompressing and indexing documents and images San FranciscoUSA Morgan Kaufmann 1999

[27] W Zhang S Xiao Y Lin T Zhou and S Zhou ldquoSecure ranked

multi-keyword search for multiple data owners in cloud com-putingrdquo in Proc 44th Annual IEEEIFIP International Conferenceon Dependable Systems and Networks (DSN2014) Atlanta USAIEEE jun 2014 pp 276ndash286

[28] IETF ldquoRequest for comments databaserdquo [Online] Availablehttpwwwietforgrfchtml

[29] H Systems ldquoHermetic word frequency counterrdquo [Online]Available httpwwwhermeticchwfcwfchtm

[30] D Boneh and M Franklin ldquoIdentity-based encryption fromthe weil pairingrdquo SIAM Journal on Computing vol 32 no 3pp 586ndash615 2003

[31] W Sun B Wang N Cao M Li W Lou Y T Hou andH Li ldquoPrivacy-preserving multi-keyword text search in thecloud supporting similarity-based rankingrdquo in Proc IEEE

ASIACCSrsquo13 Hangzhou China May 2013 pp 71ndash81[32] J Hur ldquoImproving security and efficiency in attribute-based

data sharingrdquo IEEE TRANSACTIONS ON KNOWLEDGE ANDDATA ENGINEERING vol 25 no 10 pp 2271ndash2282 2013

[33] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preserv-ing encryption for numeric datardquo in Proc ACM SIGMODrsquo04Paris France Jun 2004 pp 563ndash574

[34] A Boldyreva N Chenette Y Lee and A O ldquoOrder-preserving encryption revisited Improved security analysisand alternative solutionsrdquo in Proc Advances in Cryptology(CRYPTOrsquo11) California USA Aug 2011 pp 578ndash595

[35] Y Yi R Li F Chen A X Liu and Y Lin ldquoA digitalwatermarking approach to secure and precise range queryprocessing in sensor networksrdquo in Proc IEEE INFOCOMrsquo13Turin Italy Apr 2013 pp 1950ndash1958

[36] R A Popa F H Li and N Zeldovich ldquoAn ideal-security pro-tocol for order-preserving encodingrdquo in Security and Privacy

(SP) 2013 IEEE Symposium on IEEE 2013 pp 463ndash477[37] F Kerschbaum and A Schroepfer ldquoOptimal average-complexity ideal-security order-preserving encryptionrdquo inProceedings of the 2014 ACM SIGSAC Conference on Computerand Communications Security ACM 2014 pp 275ndash286

Wei Zhang received his BS degree in Com-puter Science from Hunan University Chinain 2011 Since 2011 He has been a PhDcandidate in College of Computer Scienceand Electronic Engineering Hunan Universi-ty His research interests include cloud com-puting network security and data mining

Yaping Lin received his BS degree in Com-puter Application from Hunan University Chi-na in 1982 and his MS degree in Com-puter Application from National University ofDefense Technology China in 1985 He re-ceived his PhD degree in Control Theoryand Application from Hunan University in2000 He has been a professor and PhD

supervisor in Hunan University since 1996From 2004-2005 he worked as a visitingresearcher at the University of Texas at Ar-

lington His research interests include machine learning networksecurity and wireless sensor networks

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 1414

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 14

Sheng Xiao received his BS degree in Ts-inghua University China in 2002 and hisMS degree from the National SingaporeUniversityMIT (Singapore-MIT Alliance) in2003 He received his PhD degree fromthe University of Massachusetts Amherst in2013 He has been an associate professor atthe College of Computer Science and Elec-tronic Engineering Hunan University He re-

ceived INFOCOM Best Paper in 2010 His re-search interests include communication se-

curity cloud computing and big data

Jie Wu is the chair and a Laura H Carnellprofessor with the Department of Computerand Information Sciences Temple Universi-ty Philadelphia PA Prior to joining TempleUniversity he was a program director withthe National Science Foundation and a dis-tinguished professor with Florida Atlantic U-niversity Boca Raton FL He regularly pub-lishes in scholarly journals conference pro-ceedings and books His research interestsinclude mobile computing and wireless net-

works routing protocols cloud and green computing network trustand security and social network applications He serves on severaleditorial boards including IEEE Transactions on Computers IEEETransactions on Service Computing and Journal of Parallel andDistributed Computing He was general cochairchair for IEEE MASS2006 IEEE IPDPS 2008 and IEEE ICDCS 2013 as well as programcochair for IEEE INFOCOM 2011 and CCF CNCC 2013 he servedas a general chair forACM MobiHoc 2014 He was an IEEE Com-puter Society distinguished visitor ACM distinguished speaker andchair for the IEEE Technical Committee on Distributed Processing(TCDP) He is a CCF distinguished speaker and a fellow of the IEEEHe is the recipient of the 2011 China Computer Federation (CCF)Overseas Outstanding Achievement Award

Siwang Zhou received his BS degree inApplied Physics from Fudan University in1995 He received his PhD degree in com-puter application technology from Hunan U-niversity in 2007 He has been an associateprofessor at the College of Computer Sci-ence and Electronic Engineering Hunan U-niversity His research interests include wire-less sensor networks and wavelet compres-

sive sensing

Page 14: Privacy Preserving Ranked Multi-Keyword

7232019 Privacy Preserving Ranked Multi-Keyword

httpslidepdfcomreaderfullprivacy-preserving-ranked-multi-keyword 1414

This article has been accepted for publication in a future issue of this journal but has not been fully edited Content may change prior to final publication Citation

information DOI 101109TC20152448099 IEEE Transactions on Computers

JOURNAL OF LATEX CLASS FILES VOL 6 NO 1 JANUARY 2015 14

Sheng Xiao received his BS degree in Ts-inghua University China in 2002 and hisMS degree from the National SingaporeUniversityMIT (Singapore-MIT Alliance) in2003 He received his PhD degree fromthe University of Massachusetts Amherst in2013 He has been an associate professor atthe College of Computer Science and Elec-tronic Engineering Hunan University He re-

ceived INFOCOM Best Paper in 2010 His re-search interests include communication se-

curity cloud computing and big data

Jie Wu is the chair and a Laura H Carnellprofessor with the Department of Computerand Information Sciences Temple Universi-ty Philadelphia PA Prior to joining TempleUniversity he was a program director withthe National Science Foundation and a dis-tinguished professor with Florida Atlantic U-niversity Boca Raton FL He regularly pub-lishes in scholarly journals conference pro-ceedings and books His research interestsinclude mobile computing and wireless net-

works routing protocols cloud and green computing network trustand security and social network applications He serves on severaleditorial boards including IEEE Transactions on Computers IEEETransactions on Service Computing and Journal of Parallel andDistributed Computing He was general cochairchair for IEEE MASS2006 IEEE IPDPS 2008 and IEEE ICDCS 2013 as well as programcochair for IEEE INFOCOM 2011 and CCF CNCC 2013 he servedas a general chair forACM MobiHoc 2014 He was an IEEE Com-puter Society distinguished visitor ACM distinguished speaker andchair for the IEEE Technical Committee on Distributed Processing(TCDP) He is a CCF distinguished speaker and a fellow of the IEEEHe is the recipient of the 2011 China Computer Federation (CCF)Overseas Outstanding Achievement Award

Siwang Zhou received his BS degree inApplied Physics from Fudan University in1995 He received his PhD degree in com-puter application technology from Hunan U-niversity in 2007 He has been an associateprofessor at the College of Computer Sci-ence and Electronic Engineering Hunan U-niversity His research interests include wire-less sensor networks and wavelet compres-

sive sensing