ieee transactions on parallel and distributed …index terms—ranked search, searchable encryption,...

13
Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data Cong Wang, Student Member, IEEE, Ning Cao, Student Member, IEEE, Kui Ren, Senior Member, IEEE, and Wenjing Lou, Senior Member, IEEE Abstract—Cloud computing economically enables the paradigm of data service outsourcing. However, to protect data privacy, sensitive cloud data have to be encrypted before outsourced to the commercial public cloud, which makes effective data utilization service a very challenging task. Although traditional searchable encryption techniques allow users to securely search over encrypted data through keywords, they support only Boolean search and are not yet sufficient to meet the effective data utilization need that is inherently demanded by large number of users and huge amount of data files in cloud. In this paper, we define and solve the problem of secure ranked keyword search over encrypted cloud data. Ranked search greatly enhances system usability by enabling search result relevance ranking instead of sending undifferentiated results, and further ensures the file retrieval accuracy. Specifically, we explore the statistical measure approach, i.e., relevance score, from information retrieval to build a secure searchable index, and develop a one-to-many order-preserving mapping technique to properly protect those sensitive score information. The resulting design is able to facilitate efficient server-side ranking without losing keyword privacy. Thorough analysis shows that our proposed solution enjoys “as- strong-as-possible” security guarantee compared to previous searchable encryption schemes, while correctly realizing the goal of ranked keyword search. Extensive experimental results demonstrate the efficiency of the proposed solution. Index Terms—Ranked search, searchable encryption, order-preserving mapping, confidential data, cloud computing. Ç 1 INTRODUCTION C LOUD Computing is the long dreamed vision of computing as a utility, where cloud customers can remotely store their data into the cloud so as to enjoy the on-demand high-quality applications and services from a shared pool of configurable computing resources [2]. The benefits brought by this new computing model include but are not limited to: relief of the burden for storage manage- ment, universal data access with independent geographical locations, and avoidance of capital expenditure on hard- ware, software, and personnel maintenances, etc., [3]. As Cloud Computing becomes prevalent, more and more sensitive information are being centralized into the cloud, such as e-mails, personal health records, company finance data, and government documents, etc. The fact that data owners and cloud server are no longer in the same trusted domain may put the outsourced unencrypted data at risk [4], [33]: the cloud server may leak data information to unauthorized entities [5] or even be hacked [6]. It follows that sensitive data have to be encrypted prior to outsourcing for data privacy and combating unsolicited accesses. However, data encryption makes effective data utilization a very challenging task given that there could be a large amount of outsourced data files. Besides, in Cloud Computing, data owners may share their outsourced data with a large number of users, who might want to only retrieve certain specific data files they are interested in during a given session. One of the most popular ways to do so is through keyword-based search. Such keyword search technique allows users to selectively retrieve files of interest and has been widely applied in plaintext search scenarios [7]. Unfortunately, data encryption, which restricts user’s ability to perform keyword search and further demands the protection of keyword privacy, makes the traditional plaintext search methods fail for encrypted cloud data. Although traditional searchable encryption schemes (e.g., [8], [9], [10], [11], [12], to list a few) allow a user to securely search over encrypted data through keywords without first decrypting it, these techniques support only conventional Boolean keyword search, 1 without capturing any relevance of the files in the search result. When directly applied in large collaborative data outsourcing cloud environment, they may suffer from the following two main drawbacks. On the one hand, for each search request, users without preknowledge of the encrypted cloud data have to go through every retrieved file in order to find ones most matching their interest, which demands possibly large amount of postprocessing overhead; On the other hand, invariably sending back all files solely based on presence/ absence of the keyword further incurs large unnecessary IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 8, AUGUST 2012 1467 . C. Wang is with the Department of Electrical and Computer Engineering, Illinois Institute of Technology, 3301 S Dearborn St., Siegel Hall 221A, Chicago, IL 60616. E-mail: [email protected]. . N. Cao is with the Department of Electrical and Computer Engineering, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609. E-mail: [email protected]. . K. Ren is with the Department of Electrical and Computer Engineering, Illinois Institute of Technology, 3301 S Dearborn St., Siegel Hall 319, Chicago, IL 60616. E-mail: [email protected]. . W. Lou is with the Department of Computer Science, Virginia Polytechnic Institute and State University, 7054 Haycock Rd, Falls Church, VA 22043. E-mail: [email protected]. Manuscript received 22 Mar. 2011; revised 4 Oct. 2011; accepted 17 Oct. 2011; published online 29 Nov. 2011. Recommended for acceptance by I. Ahmad. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TPDS-2011-03-0166. Digital Object Identifier no. 10.1109/TPDS.2011.282. 1. In the existing symmetric key-based searchable encryption schemes, the support of disjunctive Boolean operation (OR) on multiple keywords searches still remains an open problem. 1045-9219/12/$31.00 ß 2012 IEEE Published by the IEEE Computer Society

Upload: others

Post on 29-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Index Terms—Ranked search, searchable encryption, order-preserving mapping, confidential data, cloud computing. Ç 1INTRODUCTION

Enabling Secure and Efficient Ranked KeywordSearch over Outsourced Cloud Data

Cong Wang, Student Member, IEEE, Ning Cao, Student Member, IEEE,

Kui Ren, Senior Member, IEEE, and Wenjing Lou, Senior Member, IEEE

Abstract—Cloud computing economically enables the paradigm of data service outsourcing. However, to protect data privacy,

sensitive cloud data have to be encrypted before outsourced to the commercial public cloud, which makes effective data utilization

service a very challenging task. Although traditional searchable encryption techniques allow users to securely search over encrypted

data through keywords, they support only Boolean search and are not yet sufficient to meet the effective data utilization need that is

inherently demanded by large number of users and huge amount of data files in cloud. In this paper, we define and solve the problem of

secure ranked keyword search over encrypted cloud data. Ranked search greatly enhances system usability by enabling search result

relevance ranking instead of sending undifferentiated results, and further ensures the file retrieval accuracy. Specifically, we explore

the statistical measure approach, i.e., relevance score, from information retrieval to build a secure searchable index, and develop a

one-to-many order-preserving mapping technique to properly protect those sensitive score information. The resulting design is able to

facilitate efficient server-side ranking without losing keyword privacy. Thorough analysis shows that our proposed solution enjoys “as-

strong-as-possible” security guarantee compared to previous searchable encryption schemes, while correctly realizing the goal of

ranked keyword search. Extensive experimental results demonstrate the efficiency of the proposed solution.

Index Terms—Ranked search, searchable encryption, order-preserving mapping, confidential data, cloud computing.

Ç

1 INTRODUCTION

CLOUD Computing is the long dreamed vision ofcomputing as a utility, where cloud customers can

remotely store their data into the cloud so as to enjoy theon-demand high-quality applications and services from ashared pool of configurable computing resources [2]. Thebenefits brought by this new computing model include butare not limited to: relief of the burden for storage manage-ment, universal data access with independent geographicallocations, and avoidance of capital expenditure on hard-ware, software, and personnel maintenances, etc., [3].

As Cloud Computing becomes prevalent, more andmore sensitive information are being centralized into thecloud, such as e-mails, personal health records, companyfinance data, and government documents, etc. The fact thatdata owners and cloud server are no longer in the sametrusted domain may put the outsourced unencrypted dataat risk [4], [33]: the cloud server may leak data informationto unauthorized entities [5] or even be hacked [6]. It follows

that sensitive data have to be encrypted prior to outsourcingfor data privacy and combating unsolicited accesses.However, data encryption makes effective data utilizationa very challenging task given that there could be a largeamount of outsourced data files. Besides, in CloudComputing, data owners may share their outsourced datawith a large number of users, who might want to onlyretrieve certain specific data files they are interested induring a given session. One of the most popular ways to doso is through keyword-based search. Such keyword searchtechnique allows users to selectively retrieve files of interestand has been widely applied in plaintext search scenarios[7]. Unfortunately, data encryption, which restricts user’sability to perform keyword search and further demands theprotection of keyword privacy, makes the traditionalplaintext search methods fail for encrypted cloud data.

Although traditional searchable encryption schemes(e.g., [8], [9], [10], [11], [12], to list a few) allow a user tosecurely search over encrypted data through keywordswithout first decrypting it, these techniques support onlyconventional Boolean keyword search,1 without capturingany relevance of the files in the search result. When directlyapplied in large collaborative data outsourcing cloudenvironment, they may suffer from the following two maindrawbacks. On the one hand, for each search request, userswithout preknowledge of the encrypted cloud data have togo through every retrieved file in order to find ones mostmatching their interest, which demands possibly largeamount of postprocessing overhead; On the other hand,invariably sending back all files solely based on presence/absence of the keyword further incurs large unnecessary

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 8, AUGUST 2012 1467

. C. Wang is with the Department of Electrical and Computer Engineering,Illinois Institute of Technology, 3301 S Dearborn St., Siegel Hall 221A,Chicago, IL 60616. E-mail: [email protected].

. N. Cao is with the Department of Electrical and Computer Engineering,Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA01609. E-mail: [email protected].

. K. Ren is with the Department of Electrical and Computer Engineering,Illinois Institute of Technology, 3301 S Dearborn St., Siegel Hall 319,Chicago, IL 60616. E-mail: [email protected].

. W. Lou is with the Department of Computer Science, Virginia PolytechnicInstitute and State University, 7054 Haycock Rd, Falls Church, VA 22043.E-mail: [email protected].

Manuscript received 22 Mar. 2011; revised 4 Oct. 2011; accepted 17 Oct.2011; published online 29 Nov. 2011.Recommended for acceptance by I. Ahmad.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TPDS-2011-03-0166.Digital Object Identifier no. 10.1109/TPDS.2011.282.

1. In the existing symmetric key-based searchable encryption schemes,the support of disjunctive Boolean operation (OR) on multiple keywordssearches still remains an open problem.

1045-9219/12/$31.00 � 2012 IEEE Published by the IEEE Computer Society

Page 2: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Index Terms—Ranked search, searchable encryption, order-preserving mapping, confidential data, cloud computing. Ç 1INTRODUCTION

network traffic, which is absolutely undesirable in today’spay-as-you-use cloud paradigm. In short, lacking ofeffective mechanisms to ensure the file retrieval accuracyis a significant drawback of existing searchable encryptionschemes in the context of Cloud Computing. Nonetheless,the state of the art in information retrieval (IR) communityhas already been utilizing various scoring mechanisms [13]to quantify and rank order the relevance of files in responseto any given search query. Although the importance ofranked search has received attention for a long history inthe context of plaintext searching by IR community,surprisingly, it is still being overlooked and remains to beaddressed in the context of encrypted data search.

Therefore, how to enable a searchable encryption systemwith support of secure ranked search is the problem tackledin this paper. Our work is among the first few ones toexplore ranked search over encrypted data in CloudComputing. Ranked search greatly enhances system usabil-ity by returning the matching files in a ranked orderregarding to certain relevance criteria (e.g., keywordfrequency), thus making one step closer toward practicaldeployment of privacy-preserving data hosting services inthe context of Cloud Computing. To achieve our designgoals on both system security and usability, we propose tobring together the advance of both crypto and IR commu-nity to design the ranked searchable symmetric encryption(RSSE) scheme, in the spirit of “as-strong-as-possible”security guarantee. Specifically, we explore the statisticalmeasure approach from IR and text mining to embedweight information (i.e., relevance score) of each file duringthe establishment of searchable index before outsourcingthe encrypted file collection. As directly outsourcingrelevance scores will leak lots of sensitive frequencyinformation against the keyword privacy, we then integratea recent crypto primitive [14] order-preserving symmetricencryption (OPSE) and properly modify it to develop a one-to-many order-preserving mapping technique for ourpurpose to protect those sensitive weight information,while providing efficient ranked search functionalities.Our contribution can be summarized as follows:

1. For the first time, we define the problem of secureranked keyword search over encrypted cloud data,and provide such an effective protocol, which fulfillsthe secure ranked search functionality with littlerelevance score information leakage against key-word privacy.

2. Thorough security analysis shows that our rankedsearchable symmetric encryption scheme indeedenjoys “as-strong-as-possible” security guaranteecompared to previous searchable symmetric encryp-tion (SSE) schemes.

3. We investigate the practical considerations andenhancements of our ranked search mechanism,including the efficient support of relevance scoredynamics, the authentication of ranked searchresults, and the reversibility of our proposed one-to-many order-preserving mapping technique.

4. Extensive experimental results demonstrate theeffectiveness and efficiency of the proposed solution.

The rest of the paper is organized as follows: Section 2gives the system and threat model, our design goals,

notations, and preliminaries. Then, we provide the frame-work, definitions, and basic scheme in Section 3, followedby Section 4, which gives the detailed description of ourranked searchable symmetric encryption system. Section 5gives the security analysis. Section 6 studies furtherenhancements and practical considerations, followed bySection 7 on performance evaluations. Related work forboth searchable encryption and secure result ranking isdiscussed in Section 8. Finally, Section 9 gives theconcluding remark of the whole paper.

2 PROBLEM STATEMENT

2.1 The System and Threat Model

We consider an encrypted cloud data hosting serviceinvolving three different entities, as illustrated in Fig. 1:data owner, data user, and cloud server. Data owner has acollection of n data files C ¼ ðF1; F2; . . . ; FnÞ that he wants tooutsource on the cloud server in encrypted form while stillkeeping the capability to search through them for effectivedata utilization reasons. To do so, before outsourcing, dataowner will first build a secure searchable index I from a setof m distinct keywords W ¼ ðw1; w2; . . . ; wmÞ extracted2

from the file collection C, and store both the index I and theencrypted file collection C on the cloud server.

We assume the authorization between the data ownerand users is appropriately done. To search the file collectionfor a given keyword w, an authorized user generates andsubmits a search request in a secret form—a trapdoor Tw ofthe keyword w—to the cloud server. Upon receiving thesearch request Tw, the cloud server is responsible to searchthe index I and return the corresponding set of files to theuser. We consider the secure ranked keyword searchproblem as follows: the search result should be returnedaccording to certain ranked relevance criteria (e.g., keywordfrequency-based scores, as will be introduced shortly), toimprove file retrieval accuracy for users without priorknowledge on the file collection C. However, cloud servershould learn nothing or little about the relevance criteria asthey exhibit significant sensitive information against key-word privacy. To reduce bandwidth, the user may send anoptional value k along with the trapdoor Tw and cloudserver only sends back the top-k most relevant files to theuser’s interested keyword w.

We primarily consider an “honest-but-curious” server inour model, which is consistent with most of the previoussearchable encryption schemes. We assume the cloud serveracts in an “honest” fashion and correctly follows the

1468 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 8, AUGUST 2012

Fig. 1. Architecture for search over encrypted cloud data.

2. To reduce the size of index, a list of standard IR techniques can beadopted, including case folding, stemming, and stop words, etc. We omitthis process of keyword extraction and refinement and refer readers to [7]for more details.

Page 3: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Index Terms—Ranked search, searchable encryption, order-preserving mapping, confidential data, cloud computing. Ç 1INTRODUCTION

designated protocol specification, but is “curious” to inferand analyze the message flow received during the protocolso as to learn additional information. In other words, thecloud server has no intention to actively modify themessage flow or disrupt any other kind of services.However, in some unexpected events, the cloud servermay behave beyond the “honest-but-curious” model. Wespecifically deal with this scenario in Section 6.2.

2.2 Design Goals

To enable ranked searchable symmetric encryption foreffective utilization of outsourced and encrypted cloud dataunder the aforementioned model, our system design shouldachieve the following security and performance guarantee.Specifically, we have the following goals: 1) Rankedkeyword search: to explore different mechanisms fordesigning effective ranked search schemes based on theexisting searchable encryption framework; 2) Securityguarantee: to prevent cloud server from learning theplaintext of either the data files or the searched keywords,and achieve the “as-strong-as-possible” security strengthcompared to existing searchable encryption schemes;3) Efficiency: above goals should be achieved with mini-mum communication and computation overhead.

2.3 Notation and Preliminaries

. C—the file collection to be outsourced, denoted as aset of n data files C ¼ ðF1; F2; . . . ; FnÞ.

. W—the distinct keywords extracted from file collec-tion C, denoted as a set of m words W ¼ ðw1;w2; . . . ; wmÞ.

. idðFjÞ—the identifier of file Fj that can help uniquelylocate the actual file.

. I—the index built from the file collection, includinga set of posting lists fIðwiÞg, as introduced below.

. Twi—the trapdoor generated by a user as a searchrequest of keyword wi.

. FðwiÞ—the set of identifiers of files in C that containkeyword wi.

. Ni—the number of files containing the keyword wiand Ni ¼ jFðwiÞj.

We now introduce some necessary information retrievalbackground for our proposed scheme:

Inverted index. In information retrieval, inverted index(also referred to as postings file) is a widely used indexingstructure that stores a list of mappings from keywords to thecorresponding set of files that contain this keyword, allowingfull text search [13]. For ranked search purposes, the task ofdetermining which files are most relevant is typically doneby assigning a numerical score, which can be precomputed,to each file based on some ranking function introducedbelow. One example posting list of an index is shown inTable 1. We will use this inverted index structure to give ourbasic ranked searchable symmetric encryption construction.

Ranking function. In information retrieval, a rankingfunction is used to calculate relevance scores of matchingfiles to a given search request. The most widely usedstatistical measurement for evaluating relevance score in theinformation retrieval community uses the TF� IDF rule,where term frequency (TF) is simply the number of times a

given term or keyword (we will use them interchangeablyhereafter) appears within a file (to measure the importanceof the term within the particular file), and inverse documentfrequency (IDF) is obtained by dividing the number of filesin the whole collection by the number of files containing theterm (to measure the overall importance of the term withinthe whole collection). Among several hundred variations ofthe TF � IDF weighting scheme, no single combination ofthem outperforms any of the others universally [15]. Thus,without loss of generality, we choose an example formulathat is commonly used and widely seen in the literature (see[7, Ch. 4]) for the relevance score calculation in the followingpresentation. Its definition is as follows:

ScoreðQ;FdÞ ¼Xt2Q

1

jFdj� ð1þ ln fd;tÞ � ln 1þN

ft

� �: ð1Þ

Here, Q denotes the searched keywords; fd;t denotes the TFof term t in file Fd; ft denotes the number of files thatcontain term t; N denotes the total number of files in thecollection; and jFdj is the length of file Fd, obtained bycounting the number of indexed terms, functioning as thenormalization factor.

3 THE DEFINITIONS AND BASIC SCHEME

In the introduction, we have motivated the ranked keywordsearch over encrypted data to achieve economies of scale forCloud Computing. In this section, we start from the reviewof existing searchable symmetric encryption schemes andprovide the definitions and framework for our proposedranked searchable symmetric encryption. Note that byfollowing the same security guarantee of existing SSE, itwould be very inefficient to support ranked searchfunctionality over encrypted data, as demonstrated in ourbasic scheme. The discussion of its demerits will lead to ourproposed scheme.

3.1 Background on Searchable SymmetricEncryption

Searchable encryption allows data owner to outsource hisdata in an encrypted manner while maintaining theselectively search capability over the encrypted data.Generally, searchable encryption can be achieved in its fullfunctionality using an oblivious RAMs [16]. Althoughhiding everything during the search from a maliciousserver (including access pattern), utilizing oblivious RAMusually brings the cost of logarithmic number of interac-tions between the user and the server for each searchrequest. Thus, in order to achieve more efficient solutions,almost all the existing works on searchable encryptionliterature resort to the weakened security guarantee, i.e.,revealing the access pattern and search pattern but nothingelse. Here, access pattern refers to the outcome of the search

WANG ET AL.: ENABLING SECURE AND EFFICIENT RANKED KEYWORD SEARCH OVER OUTSOURCED CLOUD DATA 1469

TABLE 1An Example Posting List of the Inverted Index

Page 4: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Index Terms—Ranked search, searchable encryption, order-preserving mapping, confidential data, cloud computing. Ç 1INTRODUCTION

result, i.e., which files have been retrieved. The searchpattern includes the equality pattern among the two searchrequests (whether two searches were performed for thesame keyword), and any information derived thereafterfrom this statement. We refer readers to [12] for thethorough discussion on SSE definitions.

Having a correct intuition on the security guarantee ofexisting SSE literature is very important for us to define ourranked searchable symmetric encryption problem. As later,we will show that following the exactly same securityguarantee of existing SSE scheme, it would be veryinefficient to achieve ranked keyword search, whichmotivates us to further weaken the security guarantee ofexisting SSE appropriately (leak the relative relevance orderbut not the relevance score) and realize an “as-strong-as-possible” ranked searchable symmetric encryption. Actu-ally, this notion has been employed by cryptographers inmany recent work [14], [17] where efficiency is preferredover security.

3.2 Definitions and Framework of RSSE System

We follow the similar framework of previously proposedsearchable symmetric encryption schemes [12] and adaptthe framework for our ranked searchable encryptionsystem. A ranked searchable encryption scheme consistsof four algorithms (KeyGen, BuildIndex, TrapdoorGen,SearchIndex). Our ranked searchable encryption systemcan be constructed from these four algorithms in twophases, Setup and Retrieval:

. Setup. The data owner initializes the public andsecret parameters of the system by executing Key-

Gen, and pre-processes the data file collection C byusing BuildIndex to generate the searchable indexfrom the unique words extracted from C. The ownerthen encrypts the data file collection C, and publishesthe index including the keyword frequency-basedrelevance scores in some encrypted form, togetherwith the encrypted collection C to the Cloud. As partof Setup phase, the data owner also needs todistribute the necessary secret parameters (in ourcase, the trapdoor generation key) to a group ofauthorized users by employing off-the-shelf publickey cryptography or more efficient primitive such asbroadcast encryption.

. Retrieval. The user uses TrapdoorGen to gen-erate a secure trapdoor corresponding to his inter-ested keyword, and submits it to the cloud server.Upon receiving the trapdoor, the cloud server willderive a list of matched file IDs and their corre-sponding encrypted relevance scores by searchingthe index via SearchIndex. The matched filesshould be sent back in a ranked sequence based onthe relevance scores. However, the server shouldlearn nothing or little beyond the order of therelevance scores.

Note that as an initial attempt to investigate the secureranked searchable encryption system, in this paper wefocus on single keyword search. In this case, the IDF factorin (1) is always constant with regard to the given searchedkeyword. Thus, search results can be accurately ranked

based only on the term frequency and file length informa-tion contained within the single file using

Scoreðt; FdÞ ¼1

jFdj� ð1þ ln fd;tÞ: ð2Þ

Data owner can keep a record of these two values andprecalculate the relevance score, which introduces littleoverhead regarding to the index building. We will demon-strate this via experiments in the performance evaluationSection 7.

3.3 The Basic Scheme

Before giving our main result, we first start with astraightforward yet ideal scheme, where the security ofour ranked searchable encryption is the same as previousSSE schemes, i.e., the user gets the ranked results withoutletting cloud server learn any additional information morethan the access pattern and search pattern. However, this isachieved with the tradeoff of efficiency: namely, eithershould the user wait for two round-trip time for eachsearch request, or he may even lose the capability toperform top-k retrieval, resulting the unnecessary commu-nication overhead. We believe the analysis of thesedemerits will lead to our main result. Note that the basicscheme we discuss here is tightly pertained to recent work[12], though our focus is on secure result ranking. Actually,it can be considered as the most simplified version ofsearchable symmetric encryption that satisfies the non-adaptive security definition of [12].

Basic scheme. Let k; ‘; ‘0; p be security parameters thatwill be used in Keygen(�). Let E be a semantically securesymmetric encryption algorithm: E : f0; 1g‘ � f0; 1gr !f0; 1gr. Let � be the maximum number of files containingsome keyword wi 2W for i ¼ 1; . . . ;m, i.e., � ¼ maxmi¼1Ni.This value does not need to be known in advance for theinstantiation of the scheme. Also, let f be a pseudorandomfunction and � be a collision resistant hash function with thefollowing parameters:

. f : f0; 1gk � f0; 1g� ! f0; 1g‘

. � : f0; 1gk � f0; 1g� ! f0; 1gp where p > logm.

In practice, �ð�Þ will be instantiated by off-the-shelf hashfunctions like SHA-1, in which case p is 160 bits.

In the Setup phase

1. The data owner initiates the scheme by calling

KeyGen(1k; 1‘; 1‘0; 1p), generates random keys x;

y �R f0; 1gk, z �R f0; 1g‘, and outputs K ¼ fx; y;z; 1‘; 1‘

0; 1pg.

2. The data owner then builds a secure inverted indexfrom the file collection C by calling BuildIndex

(K; C). The details are given in Table 2. The ‘0

padding 00s indicate the valid posting entry.

In the Retrieval phase

1. For an interested keyword w, the user generates at r a p d o o r T ¼ ð�xðwÞ; fyðwÞÞ b y c a l l i n gTrapdoorGenðwÞ.

2. Upon receiving the trapdoor Tw, the server callsSearchIndexðI ; TwÞ: first locates the matching list ofthe index via �xðwÞ, uses fyðwÞ to decrypt the entries,

1470 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 8, AUGUST 2012

Page 5: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Index Terms—Ranked search, searchable encryption, order-preserving mapping, confidential data, cloud computing. Ç 1INTRODUCTION

and then sends back the corresponding files accord-ing to FðwÞ, together with their associated encryptedrelevance scores.

3. User decrypts the relevance scores via key z and getsthe ranked search results.

Discussion. The above scheme clearly satisfies thesecurity guarantee of SSE, i.e., only the access pattern andsearch pattern are leaked. However, the ranking is done onthe user side, which may bring in huge post processingoverhead. Moreover, sending back all the files consumeslarge undesirable bandwidth. One possible way to reducethe communication overhead is that server first sends backall the valid entries hidðFijÞkEzðSijÞi, where 1 � j � Ni. Userthen decrypts the relevance score and sends cloud serveranother request to retrieve the most relevant files (top-kretrieval) by the rank-ordered decrypted scores. As the sizeof valid entries hidðFijÞkEzðSijÞi is far less than thecorresponding files, significant amount of bandwidth isexpected to be saved, as long as user does not retrieve allthe matching files. However, the most obvious disadvan-tage is the two round-trip time for each search request ofevery user. Also note that in this way, server still learnsnothing about the value of relevance scores, but it knowsthe requested files are more relevant than the unrequestedones, which inevitably leaks more information than theaccess pattern and search pattern.

4 EFFICIENT RANKED SEARCHABLE SYMMETRIC

ENCRYPTION SCHEME

The above straightforward approach demonstrates the coreproblem that causes the inefficiency of ranked searchableencryption. That is how to let server quickly perform theranking without actually knowing the relevance scores. Toeffectively support ranked search over encrypted filecollection, we now resort to the newly developed crypto-graphic primitive—order preserving symmetric encryption[14] to achieve more practical performance. Note that byresorting to OPSE, our security guarantee of RSSE isinherently weakened compared to SSE, as we now letserver know the relevance order. However, this is theinformation we want to trade off for efficient RSSE, as

discussed in previous Section 3. We will first briefly discussthe primitive of OPSE and its pros and cons. Then, we showhow we can adapt it to suit our purpose for rankedsearchable encryption with an “as-strong-as-possible” se-curity guarantee. Finally, we demonstrate how to choosedifferent scheme parameters via concrete examples.

4.1 Using Order Preserving Symmetric Encryption

The OPSE is a deterministic encryption scheme where thenumerical ordering of the plaintexts gets preserved by theencryption function. Boldyreva et al. [14] gives the firstcryptographic study of OPSE primitive and provides aconstruction that is provably secure under the securityframework of pseudorandom function or pseudorandompermutation. Namely, considering that any order-preser-ving function gð�Þ from domain D ¼ f1; . . . ;Mg to rangeR ¼ f1; . . . ; Ng can be uniquely defined by a combinationof M out of N ordered items, an OPSE is then said to besecure if and only if an adversary has to perform a bruteforce search over all the possible combinations of M out ofN to break the encryption scheme. If the security level ischosen to be 80 bits, then it is suggested to choose M ¼N=2 > 80 so that the total number of combinations will begreater than 280. Their construction is based on anuncovered relationship between a random order-preservingfunction (which meets the above security notion) and thehypergeometric probability distribution, which will later bedenoted as HGD. We refer readers to [14] for more detailsabout OPSE and its security definition.

At the first glance, by changing the relevance scoreencryption from the standard indistinguishable symmetricencryption scheme to this OPSE, it seems to follow directlythat efficient relevance score ranking can be achieved justlike in the plaintext domain. However, as pointed outearlier, the OPSE is a deterministic encryption scheme. Thisinherent deterministic property, if not treated appropri-ately, will still leak a lot of information as any deterministicencryption scheme will do. One such information leakage isthe plaintext distribution. Take Fig. 2, for example, whichshows a skewed relevance score distribution of keyword“network,” sampled from 1,000 files of our test collection.For easy exposition, we encode the actual score into128 levels in domain from 1 to 128. Due to the deterministic

WANG ET AL.: ENABLING SECURE AND EFFICIENT RANKED KEYWORD SEARCH OVER OUTSOURCED CLOUD DATA 1471

TABLE 2The Details of BuildIndex(�) for Basic Scheme

Page 6: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Index Terms—Ranked search, searchable encryption, order-preserving mapping, confidential data, cloud computing. Ç 1INTRODUCTION

property, if we use OPSE directly over these sampledrelevance scores, the resulting ciphertext shall share exactlythe same distribution as the relevance score in Fig. 2. On theother hand, previous research works [18], [22] have shownthat the score distribution can be seen as keyword specific.Specifically, in [22], the authors have shown that the TFdistribution of certain keywords from the Enron e-mailcorpus3 can be very peaky, and thus result in significantinformation leak for the corresponding keyword. In [18], theauthors further point out that the TF distribution of thekeyword in a given file collection usually follows a powerlaw distribution, regardless of the popularity of the key-word. Their results on a few test file collections show thatnot only different keywords can be differentiated by theslope and value range of their TF distribution, but even thenormalized TF distributions, i.e., the original score distribu-tions (see (2)), can be keyword specific. Thus, with certainbackground information on the file collection, such asknowing it contains only technical research papers, theadversary may be able to reverse engineer the keyword“network” directly from the encrypted score distributionwithout actually breaking the trapdoor construction, nordoes the adversary need to break the OPSE.

4.2 One-to-Many Order-Preserving Mapping

Therefore, we have to modify the OPSE to suit our purpose.In order to reduce the amount of information leakage fromthe deterministic property, an one-to-many OPSE scheme isthus desired, which can flatten or obfuscate the originalrelevance score distribution, increase its randomness, andstill preserve the plaintext order. To do so, we first brieflyreview the encryption process of original deterministicOPSE, where a plaintext m in domain D is always mappedto the same random-sized nonoverlapping interval bucketin range R, determined by a keyed binary search over therange R and the result of a random HGD samplingfunction. A ciphertext c is then chosen within the bucketby using m as the seed for some random selection function.

Our one-to-many order-preserving mapping employs the

random plaintext-to-bucket mapping of OPSE, but incorpo-

rates the unique file IDs together with the plaintext m as the

random seed in the final ciphertext chosen process. Due to

the use of unique file ID as part of random selection seed, the

same plaintextmwill no longer be deterministically assigned

to the same ciphertext c, but instead a random value within

the randomly assigned bucket in rangeR. The whole process

is shown in Algorithm 1, adapted from [14]. Here,

TapeGenð�Þ is a random coin generator and HYGEINVð�Þ is

the efficient function implemented in Matlab as our instance

for the HGDð�Þ sampling function. The correctness of our

one-to-many order-preserving mapping follows directly

from the Algorithm 1. Note that our rational is to use the

OPSE block cipher as a tool for different application

scenarios and achieve better security, which is suggested

by and consistent with [14]. Now, if we denote OPM as our

one-to-many order-preserving mapping function with para-

meter: OPM : f0; 1g‘ � f0; 1glog jDj ! f0; 1glog jRj, our pro-

posed RSSE scheme can be described as follows:In the Setup phase

1. The data owner calls KeyGen(1k; 1‘; 1‘0; 1p; jDj; jRj),

generates random keys x; y; z �R f0; 1gk, and out-puts K ¼ fx; y; z; 1‘; 1‘0 ; 1p; jDj; jRjg.

2. The data owner calls BuildIndex(K; C) to buildthe inverted index of collection C, and usesOPMfzðwiÞð�Þ instead of Eð�Þ to encrypt the scores.

In the Retrieval phase

1. The user generates and sends a trapdoor Tw ¼ð�xðwÞ; fyðwÞÞ for an interested keyword w. Uponreceiving the trapdoor Tw, the cloud server firstlocates the matching entries of the index via �xðwÞ,and then uses fyðwÞ to decrypt the entry. These arethe same with basic approach.

2. The cloud server now sees the file identifiershidðFijÞi (suppose w ¼ wi and thus j 2 f1; . . . ; Nig)and their associated order-preserved encryptedscores: OPMfzðwiÞðSijÞ.

3. The server then fetches the files and sends back themin a ranked sequence according to the encryptedrelevance scores fOPMfzðwiÞðSijÞg, or sends top-kmost relevant files if the optional value k is provided.

Algorithm 1. One-to-Many Order-Preserving

Mapping-OPM1: procedure OPMKðD;R;m; idðF ÞÞ2: while jDj! ¼ 1 do

3: fD;Rg BinarySearch(K;D;R;m);

4: end while

5: coin �R TapeGenðK; ðD;R; 1km; idðF ÞÞÞ;6: c �coin R;7: return c;

8: end procedure

9: procedure BinarySearchðK;D;R;mÞ;10: M jDj; N jRj;11: d minðDÞ � 1; r minðRÞ � 1;

12: y rþ dN=2e;13: coin �R TapeGenðK; ðD;R; 0kyÞÞ;14: x �R dþHYGEINVðcoin;M;N; y� rÞ;15: if m � x then

16: D fdþ 1; . . . ; xg;17: R frþ 1; . . . ; yg;18: else

1472 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 8, AUGUST 2012

Fig. 2. An example of relevance score distribution.

3. http://www.cs.cmu.edu/~enron/.

Page 7: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Index Terms—Ranked search, searchable encryption, order-preserving mapping, confidential data, cloud computing. Ç 1INTRODUCTION

19: D fxþ 1; . . . ; dþMg;20: R fyþ 1; . . . ; rþNg;21: end if

22: return fD;Rg;23: end procedure

Discussion. With the help of order-preserving mapping,now the server can accordingly rank the files as efficientlyas for the unencrypted score. The reason that we usedifferent keys (fzðwiÞ) to encrypt the relevance score fordifferent posting lists is to make the one-to-many mappingmore indistinguishable. Therefore, the same relevancescore appearing in different lists of the index I will bemapped to different “bucket” in R. Combining this withour one-to-many mapping will randomize the encryptedvalues from an overall point of view. Thus, we can furthermitigate the useful information revealed to the cloudserver, who may be consistently interested at the statisticalanalysis on the encrypted numeric value to infer theunderlying information.

4.3 Choosing Range Size of RWe have highlighted our idea, but there still needs some carefor implementation. Our purpose is to discard the peakydistribution of the plaintext domain as much as possibleduring the mapping, so as to eliminate the predictability ofthe keyword specific score distribution on the domain D.Clearly, according to our random one-to-many order-preserving mapping (Algorithm 1 line 6), the larger sizethe range R is set, the less peaky feature will be preserved.However, the range size jRj cannot be arbitrarily large as itmay slow down the efficiency of HGD function. Here, weuse the min-entropy as our tool to find the size of range R.

In information theory, the min-entropy of a discreterandom variable X is defined as:

H1ðXÞ ¼ � logðmaxaPr½X ¼ a�Þ:

The higher H1ðXÞ is, the more difficult the X can bepredicted. We say X has high min-entropy if H1ðXÞ 2!ðlog kÞ [17], where k is the bit length used to denote all thepossible states of X . Note that one possible choice of H1ðXÞis ðlog kÞc where c > 1. Based on this high min-entropyrequirement as a guideline, we aim to find appropriate sizeof the range R, which not only helps discard the peakyfeature of the plaintext score distribution after scoreencryption, but also maintains within relatively small sizeso as to ensure the order-preserving mapping efficiency.

Let max denote the maximum possible number of scoreduplicates within the index I , and let � denote the averagenumber of scores to be mapped within each posting listIðwiÞ. Without loss of generality, we let D ¼ f1; . . . ;Mg andthus jDj ¼M. Then, based on above high min-entropyrequirement, we can find the least possible jRj satisfyingthe following equation:

max=�jRj � 1

2

5 logMþ12��

� 2�ðlogðlog jRjÞÞc : ð3Þ

Here, we use the result of [14] that the total recursive calls ofHGD sampling during an OPSE operation is a functionbelonging toOðlogMÞ, and is at most 5 logM þ 12 on average,

which is the expected number of times the rangeRwill be cutinto half during the function call of BinarySearchð�Þ. We alsoassume that the one-to-many mapping is truly random(Algorithm 1 line 5-6). Therefore, the numerator of left-handside of the above equation is indeed the expected largestnumber of duplicates after mapping. Dividing the numeratorby �, we have on the left-hand side the expected largestprobability of a plaintext score mapped to a given encryptedvalue in range R. If we denote the range size jRj in bits, i.e.,k ¼ log jRj, we can rewrite the above inequation as

max � 25 logMþ12

2k � � ¼ max �M5

2k�12 � � � 2�ðlog kÞc : ð4Þ

With the established index I , it is easy to determine theappropriate range size jRj.

Following the same example of keyword “network” inFig. 2, where max=� ¼ 0:06 (i.e., the max score duplicates is60 and the average length of the posting list is 1,000), onecan determine the ciphertext range size jRj ¼ 246, when therelevance score domain is encoded as 128 different levelsand c is set to be 1.1, as indicated in Fig. 3. Note that smallersize of range jRj is possible, when we replace the upperbound 5 logM þ 12 by other relatively “loose” function ofM belonging to OðlogMÞ, e.g., 5 logM or 4 logM. Fig. 3shows that the range jRj size can be further reduced to 234,or 227, respectively. In Section 7, we provide detailedexperimental results and analysis on the performance andeffectiveness of these different parameter selections.

5 SECURITY ANALYSIS

We evaluate the security of the proposed scheme byanalyzing its fulfillment of the security guarantee describedin Section 2. Namely, the cloud server should not learn theplaintext of either the data files or the searched keywords.We start from the security analysis of our one-to-manyorder-preserving mapping. Then, we analyze the securitystrength of the combination of one-to-many order-preser-ving mapping and SSE.

5.1 Security Analysis for One-to-Many Mapping

Our one-to-many order-preserving mapping is adaptedfrom the original OPSE, by introducing the file ID as the

WANG ET AL.: ENABLING SECURE AND EFFICIENT RANKED KEYWORD SEARCH OVER OUTSOURCED CLOUD DATA 1473

Fig. 3. Size selection of range R, given max=� ¼ 0:06, M ¼ 128, andc ¼ 1:1. The LHS and RHS denote the corresponding side of the (4).Two example choices of OðlogMÞ to replace 5 logM þ 12 in (4) are alsoincluded.

Page 8: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Index Terms—Ranked search, searchable encryption, order-preserving mapping, confidential data, cloud computing. Ç 1INTRODUCTION

additional seed in the final ciphertext chosen process. Sincesuch adaptation only functions at the final ciphertextselection process, it has nothing to do with the randomizedplaintext-to-bucket mapping process in the original OPSE.In other words, the only effect of introducing file ID as thenew seed is to make multiple plaintext duplicates m’s nolonger deterministically mapped to the same ciphertext c,but instead mapped to multiple random values within theassigned bucket in range R. This helps flatten theciphertext distribution to some extent after mapping.However, such a generic adaptation alone only works wellwhen the number of plaintext duplicates are not large. Incase there are many duplicates of plaintext m, itscorresponding ciphertext distribution after mapping maystill exhibit certain skewness or peaky feature of theplaintext distribution, due to the relative small size ofassigned bucket selected from range R.

This is why we propose to appropriately enlarge R inSection 4.3. Note that in the original OPSE, size R isdetermined just to ensure the number of differentcombinations between D and R is larger than 280. Butfrom a practical perspective, properly enlarging R in ourone-to-many case further aims to ensure the low duplicates(with high probability) on the ciphertext range aftermapping. This inherently increases the difficulty foradversary to tell precisely which points in the range R

belong to the same score in the domain D, making theorder-preserving mapping as strong as possible. Note thatone disadvantage of our scheme, compared to the originalOPSE, is that fixing the range size R requires preknow-ledge on the percentage of maximum duplicates among allthe plaintexts (i.e., max=� in (3)). However, such extrarequirement can be easily met in our scenario whenbuilding the searchable index.

5.2 Security Analysis for Ranked Keyword Search

Compared to the original SSE, the new scheme embeds theencrypted relevance scores in the searchable index inaddition to file ID. Thus, the encrypted scores are the onlyadditional information that the adversary can utilize againstthe security guarantee, i.e., keyword privacy and fileconfidentiality. Due to the security strength of the fileencryption scheme, the file content is clearly well protected.Thus, we only need to focus on keyword privacy.

From previous discussion, we know that as long as dataowner properly chooses the range size R sufficiently large,the encrypted scores in the searchable index will only be asequence of order-preserved numeric values with very lowduplicates. Though adversary may learn partial informationfrom the duplicates (e.g., ciphertext duplicates may indicatevery high corresponding plaintext duplicates), the fullyrandomized score-to-bucket assignment (inherited fromOPSE) and the highly flattened one-to-many mapping stillmakes it difficult for the adversary to predict the originalplaintext score distribution, let alone reverse engineer thekeywords. Also note that we use different order-preservingencryption keys for different posting lists, which furtherreduces the information leakage from an overall point ofview. Thus, the keyword privacy is also well preserved inour scheme.

6 FURTHER ENHANCEMENTS AND INVESTIGATIONS

Above discussions have shown how to achieve an efficientRSSE system. In this section, we give further study on howto make the RSSE system more readily deployable inpractice. We start with some practical considerations on theindex update and show how our mechanism can gracefullyhandle the case of score dynamics without introducingrecomputation overhead on data owners. For enhancedquality of service assurance, we next study how the RSSEsystem can support ranked search result authentication.Finally, we uncover the reversible property of our one-to-many order-preserving mapping, which may find indepen-dent use in other interesting application scenarios.

6.1 Supporting Score Dynamics

In Cloud Computing, outsourced file collection might notonly be accessed but also updated frequently for variousapplication purposes (see [19], [20], [21], for example).Hence, supporting the score dynamics in the searchableindex for an RSSE system, which is reflected from thecorresponding file collection updates, is thus of practicalimportance. Here, we consider score dynamics as addingnewly encrypted scores for newly created files, or modify-ing old encrypted scores for modification of existing files inthe file collection. Ideally, given a posting list in the invertedindex, the encryption of all these newly changed scoresshould be incorporated directly without affecting the orderof all other previously encrypted scores, and we show thatour proposed one-to-many order-preserving mapping doesexactly that. Note that we do not consider file deletionscenarios because it is not hard to infer that deleting any fileand its score does not affect the ranking orders of theremaining files in the searchable index.

This graceful property of supporting score dynamics isinherited from the original OPSE scheme, even though wemade some adaptations in the mapping process. This can beobserved from the BinarySearchð�Þ procedure in Algorithm 1,where the same score will always be mapped to the samerandom-sized nonoverlapping bucket, given the sameencryption key and the same parameters of the plaintextdomain D and ciphertext range R. Because the bucketsthemselves are nonoverlapping, the newly changed scoresindeed do not affect previously mapped values. Thus, withthis property, the data owner can avoid the recomputation ofthe whole score encryption for all the file collection, butinstead just handle those changed scores whenever neces-sary. Note that the scores chosen from the same bucket aretreated as ties and their order can be set arbitrarily.

Supporting score dynamics is also the reason why we donot use the naive approach for RSSE, where data ownerarranges file IDs in the posting list according to relevancescore before outsourcing. As whenever the file collectionchanges, the whole process, including the score calculation,would need to be repeated, rendering it impractical in caseof frequent file collection updates. In fact, supporting scoredynamics will save quite a lot of computation overheadduring the index update, and can be considered as asignificant advantage compared to the related work [18],[22], as will be discussed in Section 8.

1474 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 8, AUGUST 2012

Page 9: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Index Terms—Ranked search, searchable encryption, order-preserving mapping, confidential data, cloud computing. Ç 1INTRODUCTION

6.2 Authenticating Ranked Search Result

In practice, cloud servers may sometimes behave beyondthe semihonest model. This can happen either becausecloud server intentionally wants to do so for saving costwhen handling large number of search requests, or theremay be software bugs, or internal/external attacks. Thus,enabling a search result authentication mechanism that candetect such unexpected behaviors of cloud server is also ofpractical interest and worth further investigation.

To authenticate a ranked search result (or Top-kretrieval), one need to ensure: 1) the retrieved results arethe most relevant ones; 2) the relevance sequence among theresults are not disrupted. To achieve this two authenticationrequirements, we propose to utilize the one way hash chaintechnique, which can be added directly on top of theprevious RSSE design. Let Hð�Þ denote some cryptographicone-way hash function, such as SHA-1. Our mechanismrequires one more secret value u in the Setup phase to begenerated and shared between data owner and users. Thedetails go as follows:

In the Setup phase

1. When data owner calls BuildIndex(K; C), he picksan initial seed si0 ¼ fuðwiÞ for each posting list ofkeyword wi 2W . Then, he sorts the posting listbased on the encrypted scores.

2. Suppose idðFi1Þ; idðFi2Þ; . . . ; idðFivÞ denotes the or-dered sequence of file identifiers based on theencrypted relevance scores. The data owner gener-ates a hash chain

Hi1 ¼ HðidðFi1Þksi0Þ; Hi

2 ¼ HðidðFi2ÞkHi1Þ; . . . ;

Hiv ¼ HðidðFivÞkHi

v�1Þ:

3. For each corresponding entry h0‘0 kidðFijÞkEzðSijÞi,0 � j � v, in the posting list of keyword wi, the dataowner inserts the corresponding hash value of thehash chain and gets h0‘0 kidðFijÞkHi

jkEzðSijÞi. All otheroperations, like entry encryption and entry permuta-tion remain the same as previous RSSE scheme.

In the Retrieval phase

1. Whenever the cloud server is transmitting back top-k most relevant files, the corresponding k hashvalues embedded in the posting list entries shouldalso be sent back as a correctness proof.

2. The user simply generates the initial seed si0 ¼ fuðwiÞand verifies the received portion of the hash chainaccordingly.

Discussion. It is easy to see that the secret seed sharedbetween data owner and user ensures the authenticationrequirement 1, while the one-way property of hash chainguarantees the authentication requirement 2. The hash chainitself is a lightweight technique, which can be easilyincorporated in our RSSE system with negligible computa-tion/performance overhead on both data owner and users.The only tradeoff for achieving this high quality of datasearch assurance is the storage overhead on cloud, due to theaugmented size of posting list. But we believe this should beeasily acceptable because of the cheap storage cost today.

6.3 Reversing One-to-Many Order-PreservingMapping

For any order-preserving mapping process, being reversibleis very useful in many practical situations, especially when

the underlying plaintext values need to be modified orutilized for further computation purposes. While OPSE,

designed as a block cipher, by default has this property, it isnot yet clear whether our one-to-many order-preserving

mapping can be reversible too. In the following, we give apositive answer to this question.

Again, the reversibility of the proposed one-to-many

order-preserving mapping can be observed from theBinarySearchð�Þ procedure in Algorithm 1. The intuition is

that the plaintext-to-bucket mapping process of OPSE isreversible. Namely, as long as the ciphertext is chosen from

the certain bucket, one can always find through theBinarySearchð�Þ procedure to uniquely identify the plaintext

value, thus making the mapping reversible. For complete-ness, we give the details in Algorithm 2, which again we

acknowledge that is adapted from [14]. The correspondingreversed mapping performance is reported in Section 7.2.

Algorithm 2. Reversing One-to-many Order-preserving

Mapping-ROPM1: procedure OPMKðD;R; c; idðF ÞÞ2: while jDj! ¼ 1 do

3: fD;Rg BinarySearch(K;D;R; c);4: end while

5: m minðDÞ;6: coin �R TapeGenðK; ðD;R; 1km; idðF ÞÞÞ;7: w �coin R;

8: if w ¼ c then return m;9: end if

10: return ?;

11: end procedure

12: procedure BinarySearchðK;D;R; cÞ;13: M jDj; N jRj;14: d minðDÞ � 1; r minðRÞ � 1;

15: y rþ dN=2e;16: coin �R TapeGenðK; ðD;R; 0kyÞÞ;17: x �R dþHYGEINVðcoin;M;N; y� rÞ;18: if c � y then

19: D fdþ 1; . . . ; xg;20: R frþ 1; . . . ; yg;21: else

22: D fxþ 1; . . . ; dþMg;23: R fyþ 1; . . . ; rþNg;24: end if

25: return fD;Rg;26: end procedure

7 PERFORMANCE ANALYSIS

We conducted a thorough experimental evaluation of theproposed techniques on real data set: Request for comments

(RFC) database [23]. At the time of writing, the RFCdatabase contains 5,563 plain text entries and totals about

277 MB. This file set contains a large number of technical

WANG ET AL.: ENABLING SECURE AND EFFICIENT RANKED KEYWORD SEARCH OVER OUTSOURCED CLOUD DATA 1475

Page 10: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Index Terms—Ranked search, searchable encryption, order-preserving mapping, confidential data, cloud computing. Ç 1INTRODUCTION

keywords, many of which are unique to the files in whichthey are discussed. Our experiment is conducted using Cprogramming language on a Linux machine with dual IntelXeon CPU running at 3.0 GHz. Algorithms use both openssland MATLAB libraries. The performance of our scheme isevaluated regarding the effectiveness and efficiency of ourproposed one-to-many order-preserving mapping, as wellas the overall performance of our RSSE scheme, includingthe cost of index construction as well as the time necessaryfor searches. Note that though we use a single server in theexperiment, in practice we can separately store the search-able index and the file collections on different virtualizedservice nodes in the commercial public cloud, such as theAmazon EC2 and Amazon S3, respectively. In that way,even if data owners choose to store their file collection indifferent geographic locations for increased availability, theunderlying search mechanism, which always takes placebased on the searchable index, will not be affected at all.

7.1 Effectiveness of One-to-Many Order PreservingMapping

As indicated in Section 4.2, applying the proposed one-to-many mapping will further randomize the distribution ofthe encrypted values, which mitigates the chances of reverseengineering the keywords by adversary. Fig. 4 demonstratesthe effectiveness of our proposed scheme, where we choosejRj ¼ 246. The two figures show the value distribution afterone-to-many mapping with as input the same relevancescore set of keyword “network,” but encrypted with twodifferent random keys. Note that due to our safe choice of jRj(see Section 4.3) and the relative small number of total scoresper posting list (up to 1,000), we do not have any duplicatesafter one-to-many order-preserving score mapping. How-ever, for easy comparison purposes, the distribution in Fig. 4is obtained with putting encrypted values into 128 equallyspaced containers, as we do for the original score. Comparedto previous Fig. 2, where the distribution of raw score ishighly skewed, it can be seen that we indeed obtain twodifferently randomized value distribution. This is due toboth the randomized score-to-bucket assignment inheritedfrom the OPSE, and the one-to-many mapping. The formerallows the same score mapped to different random-sizednonoverlapping bucket, while the latter further obfuscatesthe score-to-ciphertext mapping accordingly. This confirms

with our security analysis that the exposure of frequencyinformation to the adversaries (the server in our case),utilized to reverse engineer the keyword, can be furtherminimized.

7.2 Efficiency of One-to-Many Order-PreservingMapping

As shown in Section 4.3, the efficiency of our proposed one-to-many order-preserving mapping is determined by boththe size of score domain M and the range R. M affects howmany rounds (OðlogMÞ) the procedure BinarySearchð�Þ orHGDð�Þ should be called. Meanwhile, M together with Rboth impact the time consumption for individual HGDð�Þcost. That’s why the time cost of single one-to-manymapping order-preserving operation goes up faster thanlogarithmic, as M increases. Fig. 5 gives the efficiencymeasurement of our proposed scheme. The result repre-sents the mean of 100 trials. Note that even for large rangeR, the time cost of one successful mapping is still finishedin 200 milliseconds, when M is set to be our choice 128.Specifically, for jRj ¼ 240, the time cost is less than70 milliseconds. This is far more efficient than the order-preserving approach used in [18], [22], where [22] needs tokeep lots of metadata to prebuild many different buckets onthe data owner side, and [18] requires the presampling andtraining of the relevance scores to be outsourced. However,our approach only needs the pregeneration of random keys.

As shown in Section 6.3, our one-to-many order-preserving mapping is in fact a reversible process. Forcompleteness, the corresponding reverse mapping perfor-mance results are reported in Fig. 6. Compared to Fig. 5, thereverse mapping process is almost as fast as the originalmapping process. This can be explained from the Algo-rithms 1 and 2 of the two approaches. Namely, for the sameprefixed system parameters, both processes share the samenumber of recursive calls for the BinarySearchð�Þ procedure,thus resulting the similar performance.

7.3 Performance of Overall RSSE System

7.3.1 Index Construction

To allow for ranked keyword search, an ordinary invertedindex attaches a relevance score to each posting entry. Ourapproach replaces the original scores with the ones after

1476 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 8, AUGUST 2012

Fig. 4. Demonstration of effectiveness for one-to-many order-preservingmapping. The mapping is derived with the same relevance score set ofkeyword “network,” but encrypted with two different random keys.

Fig. 5. The time cost of single one-to-many order-preserving mappingoperation, with regarding to different choice of parameters: the domainsize of relevance score M and the range size of encrypted score jRj.

Page 11: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Index Terms—Ranked search, searchable encryption, order-preserving mapping, confidential data, cloud computing. Ç 1INTRODUCTION

one-to-many order-preserving mapping. Specifically, it onlyintroduces the mapping operation cost, additional bits torepresent the encrypted scores, and overall entry encryptioncost, compared to the original inverted index construction.Thus, we only list in Table 3 our index constructionperformance for a collection of 1,000 RFC files. The indexsize and construction time listed were both perkeyword,meaning the posting list construction varies from onekeyword to another. This was chosen as it removes thedifferences of various keyword set construction choices,allowing for a clean analysis of just the overall performanceof the system. Note that the additional bits of encryptedscores is not a main issue due to the cheap storage cost onnowadays cloud servers.

Our experiment shows the total per list building time is5.44 s, while the raw-index only consumes 2.31 s on average.Here, the raw-index construction corresponds to the steps 1and 2 of the BuildIndex Algorithm in Table 2, whichincludes the plaintext score calculations and the invertedindex construction but without considering security. To havea better understanding of the extra overhead introduced byRSSE, we also conducted an experiment for the basicsearchable encryption scheme that supports only singlekeyword-based Boolean search. The implementation isbased on the algorithm in Table 2, excluding the scorecalculations and the corresponding score encryptions. Build-ing such a searchable index for secure Boolean search costs1.88 s per posting list. In both comparisons, we conclude thatthe score encryption via proposed one-to-many order-preserving mapping is the dominant factor for indexconstruction time, which costs about 70 ms per valid entriesin the posting list. However, given that the index construc-tion is the one-time cost before outsourcing and the enabledsecure server side ranking functionality significantly im-proves subsequent file retrieval accuracy and efficiency, weconsider the overhead introduced is reasonably acceptable.

Please note that our current implementation is not fullyoptimized. Further improvement on the implementationefficiency can be expected and is one of our important futurework.

7.3.2 Efficiency of Search

The search time includes fetching the posting list in theindex, decrypting, and rank ordering each entries. Ourfocus is on top-k retrieval. As the encrypted scores are orderpreserved, server can process the top-k retrieval almost asfast as in the plaintext domain. Note that the server does nothave to traverse every posting list for each given trapdoor,but instead uses a tree-based data structure to fetch thecorresponding list. Therefore, the overall search time cost isalmost as efficient as on unencrypted data. Fig. 7 list oursearch time cost against the value of k increases, for thesame index constructed above.

8 RELATED WORK

Searchable encryption. Traditional searchable encryption[8], [9], [10], [11], [12], [24], [25] has been widely studied as acryptographic primitive, with a focus on security definitionformalizations and efficiency improvements. Song et al. [8]first introduced the notion of searchable encryption. Theyproposed a scheme in the symmetric key setting, whereeach word in the file is encrypted independently under aspecial two-layered encryption construction. Thus, asearching overhead is linear to the whole file collectionlength. Goh [9] developed a Bloom filter-based per-fileindex, reducing the workload for each search requestproportional to the number of files in the collection. Changand Mitzenmacher [11] also developed a similar per-fileindex scheme. To further enhance search efficiency,Curtmola et al. [12] proposed a per-keyword-basedapproach, where a single encrypted hash table index isbuilt for the entire file collection, with each entry consistingof the trapdoor of a keyword and an encrypted set of relatedfile identifiers. Searchable encryption has also been con-sidered in the public-key setting. Boneh et al. [10] presentedthe first public-key-based searchable encryption scheme,with an analogous scenario to that of [8]. In theirconstruction, anyone with the public key can write to thedata stored on the server but only authorized users with theprivate key can search. As an attempt to enrich querypredicates, conjunctive keyword search over encrypted data

WANG ET AL.: ENABLING SECURE AND EFFICIENT RANKED KEYWORD SEARCH OVER OUTSOURCED CLOUD DATA 1477

Fig. 6. The time cost of single reverse one-to-many order-preservingmapping operation, with regarding to different choice of parameters:the domain size of relevance score M and the range size of encryptedscore jRj.

Fig. 7. The time cost for top-k retrieval.

TABLE 3Per Keyword Index Construction Overhead for 1,000 RFC Files

Page 12: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Index Terms—Ranked search, searchable encryption, order-preserving mapping, confidential data, cloud computing. Ç 1INTRODUCTION

have also been proposed in [26], [27], [28]. Aiming attolerance of both minor typos and format inconsistencies inthe user search input, fuzzy keyword search over encryptedcloud data has been proposed by Li et al. in [29]. Veryrecently, a privacy-assured similarity search mechanismover outsourced cloud data has been explored by Wang etal. in [34]. Note that all these schemes support only Booleankeyword search, and none of them support the rankedsearch problem which we are focusing on in this paper.

Following our research on secure ranked search overencrypted data, very recently, Cao et al. [30] propose aprivacy-preserving multikeyword ranked search scheme,which extends our previous work in [1] with support ofmultikeyword query. They choose the principle of “coordi-nate matching,” i.e., as many matches as possible, to capturethe similarity between a multikeyword search query and datadocuments, and later quantitatively formalize the principleby a secure inner product computation mechanism. Onedisadvantage of the scheme is that cloud server has to linearlytraverse the whole index of all the documents for each searchrequest, while ours is as efficient as existing SSE schemes withonly constant search cost on cloud server.

Secure top-k retrieval from Database Community [18],[22] from database community are the most related work toour proposed RSSE. The idea of uniformly distributingposting elements using an order-preserving cryptographicfunction was first discussed in [22]. However, the order-preserving mapping function proposed in [22] does notsupport score dynamics, i.e., any insertion and updates of thescores in the index will result in the posting list completelyrebuilt. Zerr et al. [18] use a different order-preservingmapping based on presampling and training of the relevancescores to be outsourced, which is not as efficient as ourproposed schemes. Besides, when scores following differentdistributions need to be inserted, their score transformationfunction still needs to be rebuilt. On the contrary, in ourscheme the score dynamics can be gracefully handled, whichis an important benefit inherited from the original OPSE. Thiscan be observed from the BinarySearchð�Þ procedure inAlgorithm 1, where the same score will always be mappedto the same random-sized nonoverlapping bucket, given thesame encryption key. In other words, the newly changedscores will not affect previous mapped values. We note thatsupporting score dynamics, which can save quite a lot ofcomputation overhead when file collection changes, is asignificant advantage in our scheme. Moreover, both worksabove do not exhibit thorough security analysis which we doin the paper.

Other related techniques. Allowing range queries overencrypted data in the public key settings has been studiedin [31], [32], where advanced privacy-preserving schemeswere proposed to allow more sophisticated multiattributesearch over encrypted files while preserving the attributes’privacy. Though these two schemes provide provablystrong security, they are generally not efficient in oursettings, as for a single search request, a full scan andexpensive computation over the whole encrypted scorescorresponding to the keyword posting list are required.Moreover, the two schemes do not support the orderedresult listing on the server side. Thus, they cannot beeffectively utilized in our scheme since the user still doesnot know which retrieved files would be the most relevant.

Difference from conference version. Portions of thework presented in this paper have previously appeared asan extended abstract in [1]. We have revised the paper a lotand improved many technical details as compared to [1].The primary improvements are as follows: First, weprovide new Section 6.1 to study and address somepractical considerations of the RSSE design. Second,we provide Section 6.2 to thoroughly study the resultcompleteness authentication. The mechanism design incurnegligible overhead on data users, and further enhances thequality of data search service. Third, we extend ourprevious result on order-preserving one-to-many mappingand show in Section 6.3 that this mapping process is indeedreversible, which can be very useful in many practicalapplications. For completeness, we provide the correspond-ing algorithm for the reverse mapping and also include itsperformance results. Finally, the related work has beensubstantially improved, which now faithfully reflects manyrecent advancements on privacy-preserving search overencrypted data.

9 CONCLUDING REMARKS

In this paper, as an initial attempt, we motivate and solvethe problem of supporting efficient ranked keyword searchfor achieving effective utilization of remotely storedencrypted data in Cloud Computing. We first give a basicscheme and show that by following the same existingsearchable encryption framework, it is very inefficient toachieve ranked search. We then appropriately weaken thesecurity guarantee, resort to the newly developed cryptoprimitive OPSE, and derive an efficient one-to-many order-preserving mapping function, which allows the effectiveRSSE to be designed. We also investigate some furtherenhancements of our ranked search mechanism, includingthe efficient support of relevance score dynamics, theauthentication of ranked search results, and the reversibilityof our proposed one-to-many order-preserving mappingtechnique. Through thorough security analysis, we showthat our proposed solution is secure and privacy preser-ving, while correctly realizing the goal of ranked keywordsearch. Extensive experimental results demonstrate theefficiency of our solution.

ACKNOWLEDGMENTS

This work was supported in part by the US National ScienceFoundation (NSF) under grants CNS-1054317, CNS-1116939,CNS-1156318, and CNS-1117111, and by Amazon webservice research grant. The authors would like to thankNathan Chenette for helpful discussions. Thanks also to RezaCurtmola for valuable suggestions in preparing the earlierversion of the manuscript. A preliminary version [1] of thispaper was presented at the 30th International Conference onDistributed Computing Systems (ICDCS ’10).

REFERENCES

[1] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, “Secure RankedKeyword Search over Encrypted Cloud Data,” Proc. IEEE 30th Int’lConf. Distributed Computing Systems (ICDCS ’10), 2010.

[2] P. Mell and T. Grance, “Draft Nist Working Definition of CloudComputing,” http://csrc.nist.gov/groups/SNS/cloud-computing/index.html, Jan. 2010.

1478 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 8, AUGUST 2012

Page 13: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Index Terms—Ranked search, searchable encryption, order-preserving mapping, confidential data, cloud computing. Ç 1INTRODUCTION

[3] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R.H. Katz, A.Konwinski, G. Lee, D.A. Patterson, A. Rabkin, I. Stoica, and M.Zaharia, “Above the Clouds: A Berkeley View of Cloud Comput-ing,” Technical Report UCB-EECS-2009-28, Univ. of California,Berkeley, Feb. 2009.

[4] Cloud Security Alliance “Security Guidance for Critical Areas ofFocus in Cloud Computing,” http://www.cloudsecurityalliance.org, 2009.

[5] Z. Slocum, “Your Google Docs: Soon in Search Results?” http://news.cnet.com/8301-17939_109-10357137-2.html, 2009.

[6] B. Krebs, “Payment Processor Breach May Be Largest Ever,”http://voices.washingtonpost.com/securityfix/2009/01/payment_processor_breach_may_b.html, Jan. 2009.

[7] I.H. Witten, A. Moffat, and T.C. Bell, Managing Gigabytes:Compressing and Indexing Documents and Images. Morgan Kauf-mann, May 1999.

[8] D. Song, D. Wagner, and A. Perrig, “Practical Techniques forSearches on Encrypted Data,” Proc. IEEE Symp. Security andPrivacy, 2000.

[9] E.-J. Goh, “Secure Indexes,” Technical Report 2003/216, Cryptol-ogy ePrint Archive, http://eprint.iacr.org/, 2003.

[10] D. Boneh, G.D. Crescenzo, R. Ostrovsky, and G. Persiano, “PublicKey Encryption with Keyword Search,” Proc. Int’l Conf. Advancesin Cryptology (EUROCRYP ’04), 2004.

[11] Y.-C. Chang and M. Mitzenmacher, “Privacy Preserving KeywordSearches on Remote Encrypted Data,” Proc. Int’l Conf. AppliedCryptography and Network Security (ACNS ’05), 2005.

[12] R. Curtmola, J.A. Garay, S. Kamara, and R. Ostrovsky, “SearchableSymmetric Encryption: Improved Definitions and Efficient Con-structions,” Proc. ACM Conf. Computer and Comm. Security (CCS’06), 2006.

[13] A. Singhal, “Modern Information Retrieval: A Brief Overview,”IEEE Data Eng. Bull., vol. 24, no. 4, pp. 35-43, 2001.

[14] A. Boldyreva, N. Chenette, Y. Lee, and A. O’Neill, “Order-Preserving Symmetric Encryption,” Proc. Int’l Conf. Advances inCryptology (Eurocrypt ’09), 2009.

[15] J. Zobel and A. Moffat, “Exploring the Similarity Space,” SIGIRForum, vol. 32, no. 1, pp. 18-34, 1998.

[16] O. Goldreich and R. Ostrovsky, “Software Protection andSimulation on Oblivious Rams,” J. ACM, vol. 43, no. 3, pp. 431-473, 1996.

[17] M. Bellare, A. Boldyreva, and A. O’Neill, “Deterministic andEfficiently Searchable Encryption,” Proc. Ann. Int’l CryptologyConf. Advances in Cryptology (Crypto ’07), 2007.

[18] S. Zerr, D. Olmedilla, W. Nejdl, and W. Siberski, “Zerber+r: Top-kRetrieval from a Confidential Index,” Proc. Int’l Conf. ExtendingDatabase Technology: Advances in Database Technology (EDBT ’09),2009.

[19] Q. Wang, C. Wang, K. Ren, W. Lou, and J. Li, “Enabling PublicVerifiability and Data Dynamics for Storage Security in CloudComputing,” IEEE Trans. Parallel and Distributed Systems, vol. 22,no. 5, pp. 847-859, May 2011.

[20] C. Wang, Q. Wang, K. Ren, and W. Lou, “Towards Secure andDependable Storage Services in Cloud Computing,” IEEE Trans.Service Computing, to appear.

[21] C. Wang, S. Chow, Q. Wang, K. Ren, and W. Lou, “Privacy-Preserving Public Auditing for Secure Cloud Storage,” IEEE Trans.Computers, to appear.

[22] A. Swaminathan, Y. Mao, G.-M. Su, H. Gou, A.L. Varna, S. He, M.Wu, and D.W. Oard, “Confidentiality-Preserving Rank-OrderedSearch,” Proc. Workshop Storage Security and Survivability, 2007.

[23] RFC “Request for Comments Database,” http://www.ietf.org/rfc.html, 2012.

[24] B. Waters, D. Balfanz, G. Durfee, and D. Smetters, “Building anEncrypted and Searchable Audit Log,” Proc. Ann. Network andDistributed Security Symp. (NDSS ’04), 2004.

[25] F. Bao, R. Deng, X. Ding, and Y. Yang, “Private Query onEncrypted Data in Multi-User Settings,” Proc. Int’l Conf. Informa-tion Security Practice and Experience (ISPEC ’08), 2008.

[26] P. Golle, J. Staddon, and B.R. Waters, “Secure ConjunctiveKeyword Search over Encrypted Data,” Proc. Second Int’l Conf.Applied Cryptography and Network Security (ANCS ’04), pp. 31-45,2004.

[27] L. Ballard, S. Kamara, and F. Monrose, “Achieving EfficientConjunctive Keyword Searches over Encrypted Data,” Proc. Int’lConf. Information and Comm. Security (ICICS ’05), 2005.

[28] Y.H. Hwang and P.J. Lee, “Public Key Encryption with Con-junctive Keyword Search and Its Extension to a Multi-UserSystem,” Proc. Int’l Conf. Pairing-Based Cryptography (Pairing ’07),pp. 31-45, 2007.

[29] J. Li, Q. Wang, C. Wang, N. Cao, K. Ren, and W. Lou, “FuzzyKeyword Search over Encrypted Data in Cloud Computing,” Proc.IEEE INFOCOM ’10, 2010.

[30] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, “Privacy-PreservingMulti-Keyword Ranked Search over Encrypted Cloud Data,” Proc.IEEE INFOCOM ’11, 2011.

[31] D. Boneh and B. Waters, “Conjunctive, Subset, and Range Querieson Encrypted Data,” Proc. Fourth Conf. Theory of Cryptography (TCC’07), pp. 535-554, 2007.

[32] E. Shi, J. Bethencourt, H. Chan, D. Song, and A. Perrig, “Multi-Dimensional Range Query over Encrypted Data,” Proc. IEEESymp. Security and Privacy, 2007.

[33] K. Ren, C. Wang, and Q. Wang, “Security Challenges for thePublic Cloud,” IEEE Internet Computing, vol. 16, no. 1, pp. 69-73,2012.

[34] C. Wang, K. Ren, S. Yu, K. Mahendra, and R. Urs, “AchievingUsable and Privacy-Assured Similarity Search over OutsourcedCloud Data,” Proc. IEEE INFOCOM, 2012.

Cong Wang received the BE and ME degrees from Wuhan University,China, in 2004 and 2007, respectively. He is currently working towardPhD degree in the Electrical and Computer Engineering Department atIllinois Institute of Technology. He has been a summer intern at PaloAlto Research Centre in 2011. His research interests are in the areas ofapplied cryptography and network security, with current focus on securedata service outsourcing in Cloud Computing. He is a student member ofthe IEEE.

Ning Cao received the BE and ME degrees from Xi’an JiaotongUniversity, China, in 2002 and 2008, respectively. He is currentlyworking toward the PhD degree in the Electrical and ComputerEngineering Department at Worcester Polytechnic Institute. His re-search interests are in the areas of security, privacy, and reliability inCloud Computing. He is a student member of the IEEE.

Kui Ren received the PhD degree from Worcester Polytechnic Institute.He is currently an assistant professor in the Electrical and ComputerEngineering Department at the Illinois Institute of Technology. Hisresearch expertise includes Cloud Computing and Security, WirelessSecurity, and Smart Grid Security. His research is supported by USNational Science Foundation (NSF) (TC, NeTS, CSR, NeTS-Neco), USDepartment of Energy (DoE), AFRL, and Amazon. He is a recipient ofNational Science Foundation Faculty Early Career Development(CAREER) Award in 2011. He received the Best Paper Award fromIEEE ICNP 2011. He serves as an associate editor for IEEE WirelessCommunications and IEEE Transactions on Smart Grid. He is a seniormember of the IEEE and a member of the ACM.

Wenjing Lou received the BE and ME degrees in computer science andengineering from Xi’an Jiaotong University, China, in 1993 and 1996,respectively. She received the MASc degree from Nanyang Technolo-gical University, Singapore, in 1998 and the PhD degree in electrical andcomputer engineering from University of Florida in 2003. She joined theComputer Science Department at Virginia Polytechnic Institute andState University in 2011 and has been an associate professor withtenure since then. Prior to that, she had been on the faculty of theDepartment of Electrical and Computer Engineering at WorcesterPolytechnic Institute, where she had been an assistant professor since2003 and was promoted to associate professor with tenure in 2009. Sheis currently serving on the editorial board of five journals: IEEETransactions on Wireless Communications, IEEE Transactions onSmart Grid, IEEE Wireless Communications Letter, Elsevier ComputerNetworks, and Springer Wireless Networks. She has served as TPCcochair for the security symposium of several leading IEEE conferences.She was named Joseph Samuel Satin Distinguished fellow in 2006 byWPI. She was the recipient of the US National Science Foundation(NSF) Faculty Early Career Development (CAREER) award in 2008.She received the Sigma Xi Junior Faculty Research Award at WPI in2009. She is a senior member of the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

WANG ET AL.: ENABLING SECURE AND EFFICIENT RANKED KEYWORD SEARCH OVER OUTSOURCED CLOUD DATA 1479