ieee transactions on cloud computing 1 …1croreprojects.com/basepapers/2017/tees an...

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TCC.2015.2398426, IEEE Transactions on Cloud Computing

IEEE TRANSACTIONS ON CLOUD COMPUTING 1

TEES: An Efficient Search Scheme overEncrypted Data on Mobile Cloud

Jian Li, Member IEEE/ACM, Ruhui Ma, Haibing Guan, Member IEEE/ACM

Abstract—Cloud storage provides a convenient, massive, and scalable storage at low cost, but data privacy is a major concern thatprevents users from storing files on the cloud trustingly. One way of enhancing privacy from data owner point of view is to encrypt thefiles before outsourcing them onto the cloud and decrypt the files after downloading them. However, data encryption is a heavy overheadfor the mobile devices, and data retrieval process incurs a complicated communication between the data user and cloud. Normally withlimited bandwidth capacity and limited battery life, these issues introduce heavy overhead to computing and communication as wellas a higher power consumption for mobile device users, which makes the encrypted search over mobile cloud very challenging.In this paper, we propose TEES (Traffic and Energy saving Encrypted Search), a bandwidth and energy efficient encrypted searcharchitecture over mobile cloud. The proposed architecture offloads the computation from mobile devices to the cloud, and we furtheroptimize the communication between the mobile clients and the cloud. It is demonstrated that the data privacy does not degrade whenthe performance enhancement methods are applied. Our experiments show that TEES reduces the computation time by 23% to 46%and save the energy consumption by 35% to 55% per file retrieval, meanwhile the network traffics during the file retrievals are alsosignificantly reduced.

Index Terms—Mobile Cloud Storage, Searchable Data Encryption, Energy Effiiency, Traffic Efficiency.

F

1 INTRODUCTION

Cloud storage system is a service model in which dataare maintained, managed and backuped remotely onthe cloud side, and meanwhile data keeps available tothe users over a network. Mobile Cloud Storage (MCS)[1], [2] denotes a family of increasingly popular on-lineservices, and even acts as the primary file storage forthe mobile devices [3]. MCS enables the mobile deviceusers to store and retrieve files or data on the cloudthrough wireless communication, which improves thedata availability and facilitates the file sharing processwithout draining the local mobile device resources [4].

The data privacy issue is paramount in cloud storagesystem, so the sensitive data is encrypted by the ownerbefore outsourcing onto the cloud, and data users re-trieve the interested data by encrypted search scheme.In MCS, the modern mobile devices are confronted withmany of the same security threats as PCs, and varioustraditional data encryption methods are imported inMCS [5], [6]. However, mobile cloud storage systemincurs new challenges over the traditional encryptedsearch schemes, in consideration of the limited comput-ing and battery capacities of mobile device, as well asdata sharing and accessing approaches through wireless

• J. Li is with the School of Software, Shanghai Jiao Tong University,Shanghai, China, 200240. E-mail: [email protected]

• H. Guan is with the Department of Computer Science and Technol-ogy, Shanghai Jiao Tong University, Shanghai, China, 200240. E-mail:[email protected]

• R. Ma is with the Department of Computer Science and Technol-ogy, Shanghai Jiao Tong University, Shanghai, China, 200240. E-mail:[email protected]

H.B. Guan is the corresponding author.

communication. Therefore, a suitable and efficient en-crypted search scheme is necessary for MCS.

Generally speaking, the mobile cloud storage is ingreat need of the bandwidth and energy efficiency fordata encrypted search scheme, due to the limited batterylife and payable traffic fee. Therefore, we focus on the de-sign of a mobile cloud scheme that is efficient in terms ofboth energy consumption and the network traffic, whilekeep meeting the data security requirements throughwireless communication channels.

To this end, we introduce TEES (Traffic and En-ergy saving Encrypted Search) architecture for mobilecloud storage applications. TEES achieves the efficienciesthrough employing and modifying the ranked keywordsearch as the encrypted search platform basis, which hasbeen widely employed in cloud storage systems. Tradi-tionally, two categories of encrypted search methods exit,that can enable the cloud server to perform the searchover the encrypted data: ranked keyword search and booleankeyword search. The ranked keyword search adopts therelevance scores [7] to represent the relevance of a fileto the searched keyword and sends the top-k relevantfiles to the client. It is more suitable for cloud storagethan the boolean keyword search approaches (e.g. [8],[9], [10], [11]), since boolean keyword search approachesneed to send all the matching files to the clients, andtherefore incur a larger amount of network traffic and aheavier post-processing overhead for the mobile devices.

By careful redesign of ranked keyword search pro-cedure, TEES offloads the security calculation to thecloud side to save the energy consumption of mobiledevices, and TEES also simplify the encrypted searchprocedure to reduce the traffic amount for retrieving data




from encrypted cloud storage. Besides the energy andtraffic efficiencies, TEES is implemented with securityenhancement in consideration of the modified encryptedsearch procedure in order to mitigate statistics informationleak and keywords-files association leak [12] [13] for MCS,by adding noise in Term Frequency distribution functionand keeping the Order Preserving Encryption attributes.

Note that TEES is implemented with security en-hancement based on popular TF-IDF [13][14][15][16],but the essential security defects of this encryption ap-proach cannot be completely resolved. To the best of ourknowledge, there is no unbreakable security scheme, butTEES architecture is general enough to host and enhancevarious encrypted search schemes as we will discuss inSection 4.3. Moreover, we suggest that a cloud storageservice provider is semi-honest and will not colludewith attacker in TEES, as most of the related works.TEES employs the architecture redesign over traditionalencrypted search procedure, and our comprehensiveexperiments prove the TEES has following advantagesin comparison with the traditional complex encryptedsearch procedure:

1) TEES reduces the energy consumption by35%∼55% by offloading the computation ofthe relevance scores to the cloud server. Thisreduces the computing workload on the mobiledevice side while at the same time significantlyspeeding up the mobile file access speed (e.g. itdoubles the speed for accessing a 100KB file).

2) With a simplified search and retrieval process,TEES reduces the network traffic for the commu-nication of the selected index, and reduces the fileretrieval time by 23%∼46% in our experiments.

3) In implementing the redesigned encrypted searchprocedure, TEES redistributes the encrypted indexto avoid statistics information leak, and wrapskeywords adding noise in order to render themindistinguishable to the attackers. Security analysisshow that the security level of TEES is guaranteedand enhanced for MCS wireless communicationchannels.

The rest of the paper is organised as follows: weintroduce the traditional encrypted search architecture,related work, new opportunities and challenges in Sec-tion 2. Then we describe the design of TEES and discussits efficiencies in terms of energy, traffic and executiontime in Section 3. Then, Section 4 describes the moduleimplementations and a security enhancement to themodel is presented in Section 5. The performance isevaluated in Section 6. Conclusions and future work arediscussed in Section 7.

2 FILE RETRIEVAL IN CLOUD STORAGE

2.1 Traditional Encrypted Search over Cloud DataTraditional cloud storage system architecture and gen-eral procedures are shown in Figure 1, which include:file/index encryption by the data owner, outsourcing

Data OwnerFiles

Encrypted Index

Encrypted Files

Storage Nodes Cloud ServerOutsourcing

Outsourcing

Authentication

1.Encrypted K

eywords

2.Selected index

4. Files R

etrieval

5.Relevant Files

3.Ranking Function

Fig. 1. Traditional Encrypted Search Architecture

the data to the cloud storage, and encrypted datasearch/retrieval procedure of the data users in cloudcomputing.

2.1.1 File/Index EncryptionThe data owner first executes the preprocessing andindexing work as shown on Figure 2. He should invertfiles, that are selected to store on the cloud, for textsearch engines [17]. Every word in these files undergoesstemming to retain the word stem. After this step, thedata owner encrypts and hashes every term (word stem)to fix its entry in the index. The index is then createdby the data owner. Finally, the data owner encrypts theindex and stores it into the cloud server, together withthe encrypted file set. Most of the previous schemesunder this architecture use Order Preserving Encryption(OPE) [18] to encrypt the file index. This file indexis often a TF (Term Frequency) table composed of TFvalues [19]. The TF-IDF table could be used to determineword relevance in documents [20], [21], [22].

StemEncrypt

&Hash

Build TF

table

Encrypt

TF table

Store to

cloud

File Set

Encrypt

Data Owner

Fig. 2. Process of Preprocessing and Indexing

2.1.2 Data Search and Retrieval after AuthenticationA data user can only access a file after being authenti-cated by the data owner. In the process of authentica-tion, the data user sends his identity to the data owner.The data owner sends the encrypted keys back if theuser is a legal user.

In the process of search and retrieval, the cloud serverhelps the users to find the top-k relevant files for a givenkeyword without decrypting it. Searches incur followingthe steps, as illustrated in Figure 1:

1) An authenticated user stems the keyword to bequeried, encrypts it with the keys and hashes it to get its




entry in the index. Then the encrypted keyword is sentto the cloud server.

2) On receiving the encrypted keyword, the cloudserver first searches for it in the index. Then the indexrelated to this keyword is sent back to the data user.

3) The data user calculates the relevance scores withthe selected index to find the top-k relevant files andsends a follow-up request to the cloud server in orderto retrieve the files.

4) The position of these files is selected and they aresent back to the data user from the cloud server.

5) The data user decrypts the files and recovers theoriginal data.

The related computational components for these stepsare illustrated in Figure 3, which indicate the traditionaltwo-round-trip scheme for a file search and retrievalprocess invoked by an authenticated user. We call thisfile retrieval scheme abbreviated as TRS (Two Roundtrip Search). This scheme provides privacy protectionthrough a complicated file retrieval process compared toa simple PlainText Search scheme (PTS) where searchingand retrieving a file is done in only one round withoutsecurity service.

Stem Encrypt Hash

Send files back

(top-k)

Search&Send back

selected index

Calculate

and rank

Retrieve

Cloud ServerDecryptData User

Fig. 3. TRS: Two-Round-Trip Encrypted Search

This TRS scheme may be suitable for the users withpersonal computer, but when it comes to a mobile de-vice, the mobile client needs to decrypt the index andcalculate the relevance scores, which incur a heavy bur-den. Additionally, more communication between clientand server will introduce more latency and cost morepower, at the same time mobile device users normallycare about the traffic consumption because of payabletraffic fee.

2.2 MCS Challenges and Design PrincipleEfficiency Challenges: Note that traditional file searchand retrieval schemes, such as TRS, can provide datasecurity but at the cost of more complicated proceduresthan Plain Text Search (PTS). TRS has been widelyemployed in cloud storage systems, but the encryptionand the ranking incur the heavy calculation cost on amobile device, and thus introduce the new challenges inefficiency for MCS traffic and energy consumption. It is

necessary to rethink the design of the whole procedurewith a careful consideration of the energy consumptionand of the traffic efficiency. We analyse the model andindicate several possible optimizations.

First, it is obvious that in traditional schemes, themobile client has a heavy workload for decrypting theselected index, calculating and ranking the relevancescores. It will take more time when comes to the mobileclient since the computing capacity of a mobile deviceis limited. This is also clearly inappropriate when thebattery of a mobile device is taken into account. Second,the two round-trips for each file search and retrievalrequest, as shown in Figure 3, is a heavy burden formobile devices with limited bandwidth and traffic fees.In a bad network environment, more latency will beintroduced in two round-trips than in one round-trip.

Thus, we offload the calculation of the relevance scoresto the cloud server; this reduces the burden on themobile clients while also shortening the retrieval process.As a result, the data user can receive the most relevantfiles within only one communication.

Security Challenges: According to the efficiency chal-lenges in cloud storage mentioned before, we shouldthen address the security challenges introduced by of-floading part of the calculation onto the cloud. We con-sider the scenario where an authorized data user wantsto search for files stored on the cloud server. This datauser needs to retrieve the most relevant files through theencrypted data without downloading all the files. So theindex should be stored in the cloud, leading to potentialthreats for MCS in the following cases:

1) Statistics Information Leak. Attackers could get theterms by analysing the TF table, since an OrderPreserving Encryption (OPE) method encrypted TFtable produces a peaky histogram of TF values.In other words, term frequency should be evenlydistributed to avoid statistic information leak, oth-erwise a broken index can be introduced withserious information leaks.

2) Keywords-files Association Leak. An attacker coulddetermine query terms by observing queries andresults through a wireless channel: as the resultof the retrieval is keyword specific, attackers mayguess the queried keyword by only observing thekeyword and the result of the retrieval. Thus, itshould avoid the relation of this keywords-filesassociation in data encryption, which will be elab-orated in Section 4.

3) Server Information Acquisition. The cloud servermaybe honest-but-curious [23], and my try to learnthe underlying plaintext of users data. The cloudserver can infer and analyze the encrypted indexand get additional information, and we need tominimize the information acquisition of the curiouscloud server.

Design Principle: Our design goal is to achieve an ef-ficient encrypted search architecture, while both consid-ering these security threats in the modified implementa-




tion. Our scheme will offload most of the computationalload to the cloud server as described in Section 3. Nextly,Section 4 presents the details of its implementation withsecurity enhancement.

Note that the novel TEES architecture design mustbase on an encrypted search method [24][25], which tra-ditionally include single-keyword search and multiple-keyword search. Multiple-keyword search [26] [27] [28]enables conjunctive or disjunctive search formulas, but itusually incurs high computational complexity to realizemulti-dimensional range query over encrypted data dueto the heavy reliance on public-key cryptography. In con-sideration of the limited data volume for MCS users andlimited computing capacity of mobile devices, single-keyword search is more suitable. Furthermore, relevancescore calculation in multiple-keyword is hard to insurethe order preservation for search accuracy, which willbe discussed in Section 4.3. Therefore, we dedicate in thesingle-keyword search algorithm to provide secured andlightweight searchable encryption solution for mobilecloud storage and file sharing system design.

2.3 Related Work

2.3.1 Encrypted Search Schemes

Over the past recent years, encrypted search has evolvedtoward the ability data sharing with protection of users’privacies. Song et al. [8] raised the question how to dokeyword searches on encrypted data efficiently. Theyproposed a scheme which encrypted each word of adocument separately. So it is not compatible with ex-isting file encryption schemes and it cannot deal withcompressing data. After that many methods of keywordsearch showed up such as [29]. In Information Retrieval,TF-IDF (term frequency-inverse document frequency)[30] is a statistic which reflects how important a wordis to a document in a collection or corpus. It is oftenused as a weighting factor in keyword-based retrievaland text mining [19]. The TF-IDF algorithm proposed bySalton and McGill’s book [31] is one of the most popularschemes, among other schemes as [32], [33].

Up to now, encrypted search includes Boolean key-word search and ranked keyword search. In Booleankeyword search [8], [9], [10], the server sends back filesonly based on the existence or absence of the keywords,without looking at their relevance.

Ranked keyword Search: Chang et al. [11] provideda scheme of keyword search, but it does not send backthe most relevant files. In ranked encrypted search, theserver sends back the top-k ranked files. Most of theprevious schemes used OPE [34] to encrypt the index ofthe file set, although the fully homomorphic encryptionmethod could also be used [35], [36]. In previous work,Agrawal et al. [18] proposed a one-to-one mapping OPEwhich will lead to Statistics Information Leak Control.Wang et al. [13] proposed a one-to-many mapping OPE;They implemented a complicate algorithm for security

protection. However, their performance and energy con-sumption would a problem since their algorithm wascomplicate and need much computing resource. Swami-nathan et al. [37] proposed a confidentiality-preservingrank-ordered search. This scheme displays low perfor-mances as the relevance scores are computed on theclient side, increasing its workload. Zerr et al.[12] intro-duced the Zerber+r model featuring a novel techniquethat renders the relevance scores and number of follow-up requests for different terms indistinguishable for theserver while preserving the retrieval accuracy of theserver-side top-k processing. The client then decrypts theelements returned by the server and filters them fornon-queried terms. Orencik et al.[38] proposed a schemewith an Trapdoor process. Bowers et al. [39] introduceda distributed cryptographic system that allows a set ofservers to prove to a client that a stored file is intact andretrievable. Wang et al. [14] presented a secure rankedkeyword search over encrypted cloud data. However, intheir work the terms are closely related to the files, whichcould lead to potential information leak.

Currently, many researches focus on improving theencrypted search accuracy with multi-keywords ranking.Wang et al. [13] proposed a one round trip search schemewhich could search the encrypted data. It worths notic-ing that multi-keyword ranked search may incur moreserious Keywords-files Association Leak problem (men-tioned in Section 2) if attackers observed the keywordsand the return files to learn some relationships betweenkeywords and files, especially through wireless commu-nication channels for mobile cloud. Cao et al. [27], [15]proposed privacy preserving method for multi-keywordencrypted search with a way to control the ‘doublekey leak”. In [16], a fuzzy multi-keyword fuzzy searchscheme was presented, but it suffers from inefficientsearch time with two round-trip communications. Notethat multi-keyword is potentially the future main streamencrypted search scheme with higher searching accuracy,but current on-going research cannot provide an author-itative method. Therefore, we will employ the single-keyword with OPE TF-IDF encryption method as a basisto establish a more power and traffic efficient encrypteddata search architecture.

2.3.2 Power and Traffic Efficiency ImprovementsSchemes

The previous schemes cannot directly apply to mobilecloud, for achieving efficient energy consumption toaddress the important issue for mobile cloud. In recentyears many OPE [18], [34], [40] or fully homomor-phic encryption [41], [42] methods have been proposed.They proved themselves secure and accurate enoughfor searching encrypted data purpose. However, theycost many computing resources. As energy consumptionbecoming important, a complicated algorithm is notsuitable in mobile devices. Therefore we choose a simpleorder preserving encryption method in our TEES. Kumar




et al. [43] raised the question of the importance of the en-ergy and performance in mobile cloud computing. Theyconcluded that four basic approaches to saving energyand extending battery lifetime in mobile devices can beconsidered. Miettinen et al. [44] provided an analysis ofthe critical factors affecting the energy consumption ofa mobile client in cloud computing. They also presentsome measurements related to the central characteristicsof contemporary mobile devices that define the basicbalance between local and remote computing. Carroll etal. [45] also presented a detailed analysis of the powerconsumption of mobile phone, in which the energyusage and battery lifetime were tested under a number ofusage patterns. They identified the most promising areasto focus on for further improvements of power manage-ment. These views underscore the fact that offloadingthe workload to the server is a good strategy to modifythe previous encrypted search and render them suitableto the mobile cloud context.

3 TEES SYSTEM DESIGN

To effectively support an encrypted search scheme witha high security level over cloud data, we introduce anew architecture that we name TEES. According to thethreats introduced in Section 2, our aim is to designa practical solution for secure encrypted search over amobile cloud storage. We first introduce the design ideain Section 3.1, and then introduce development of ourown protocol with the change of the traditional processof file search and retrieval for the cloud data in Section3.2. Our scheme achieves the security and efficiencygoals mentioned above. Thereafter, we discuss the rea-sons why TEES can achieve performance efficiency inSection 3.3.

3.1 The Basic Idea of TEES

The basic idea behind TEES is to offload the calculationand the ranking load of the relevance scores to thecloud. It has been highlighted that offloading somecomputation intensive applications onto the cloud canbe an efficient low power design philosophy [43]. Cloudproviders can provide computing cycles, and users canuse these cycles to reduce the amounts of computationon mobile systems and save energy. However, at thesame time, offloaded applications intend to increasethe transmission amount and thus increase the energyconsumption from another aspect. This double effectsmotivates us to carefully redesign the traditional fileencrypted search and retrieval process. We first take anoverview of major processes for all file encrypted searchand retrieval schemes. There are normally three mainprocesses:• The process of authentication is used by the data

owner to authenticate the data users.• The file set and its index are stored in the cloud

after being encrypted by the data owner during thepreprocessing and indexing stages.

• The data user searches the files corresponding to akeyword by sending a request to the cloud serverin the search and retrieval processes.

We now introduce the detailed design how TEES ad-dresses the power efficiency and the security challengesin modifying these processes.

3.2 Modified Process of Search and RetrievalDuring the preprocessing and indexing stages, the dataowner gets a TF table as index and uses Order PreservingEncryption (OPE) to encrypt it. As a result, the cloudserver is able to calculate the relevance scores and rankthem without decrypting the index. This renders theoffloading of the computational load secure and possible.Thus, the modified search and retrieval processes ofTEES shown in Figure 4 follow the steps:

Stem Encrypt Hash WrapUnwrap&

Search

relevance

score

calculation

Decrypt

Data UserCloud Server

Send files

Back(top-k)

Fig. 4. ORS: Novel Process of Search and Retrieval

i) If a data user wants to retrieve the top-k relevant filesbased on a keyword, he first obtains authentication fromthe data owner and then receives the keys to encrypt thekeyword.

ii) The data user stems the keyword to be queried andencrypts it using the keys.

iii) The data user wraps the encrypted keyword into atuple, adding some noise to avoid statistic informationleak; this tuple is used to perform the retrieval. Then, it issent to the cloud server together with the number k. Thewrap method renders the keywords indistinguishablefor an attacker, which will be introduced in Section 5in details.

iv) On receiving the wrapped keyword, the cloudserver first makes sure that it is accessed by a legaluser. If the server is notified by the data owner that thisuser is to become invalid in a near future, the searchis performed but a warning is also issued. If this is alegal user, the server unwraps the tuple to recover theentry of the keyword and searches for it in the index.After calculating the relevance scores, the position of thefiles corresponding to the keyword is picked and the top-k relevant files are sent back to the data user’s mobileclients without performing any decryption on these files.

v) The data user decrypts these files in the mobileclient and recovers the original data.

Comparing Figure 3 and Figure 4, we conclude thatthe search and retrieval processes in TEES are indeed




simplified to a single access than TRS. We call it ORS(One Round trip Search), which offload the computationload of “relevance score calculation” from mobile usersto the cloud and can intuitively reduce the communica-tion process between the users and cloud server. More-over, since the relevance score calculation is offloadedto the cloud server, it directly sends the top-k relevantfiles back to the data user after it receives the retrievalrequest, which can also reduce the traffic amount for fileretrievals at the same time.

Note that this offloading will not jeopardize the datasecurity in MCS due to the careful TEES redesign andimplementation in encryption enhancement, which willbe detailed in Section 4. Moreover, the security issueswill be further discussed and evaluated in Section 5.Performance comparative experiments on ORS, TRS andPTS schemes will be described in Section 6. We now firstdiscuss the efficiency of TEES.

3.3 Discussion: Performance Efficiency of TEES

The overall architecture of TEES is shown in Figure 5,in which the relevance scores calculation is offloadedto the cloud, which eases the heavy burden on mobileclients. Moreover, TEES features only one round-tripcommunication for each search as described in Figure 4rather than TRS as in the previous schemes as in Figure3. In this scheme, the file search and retrieval steps areas follows:

i) The data user sends his identity to the data ownerand get the secret keys if authenticated.

ii) An authenticated user stems the keyword to bequeried, encrypts it with the keys and hashes it to get itsentry in the index. Then the encrypted keyword is sentto the cloud server.

iii) On receiving the encrypted keyword, the cloudserver will use the function of relevance score calculation(implementation will be detailed in Section 4) to find thetop-k relevant files and sent back to the data user wherethe top-k is configured by the users.

iv) The data user decrypts the files and recovers theoriginal data.

Thus, the benefits of TEES are easily observed, andwe elaborate and quantify them in the following sub-sections.

3.3.1 Reducing the Energy Consumption

Energy is a precious resource on mobile phones. Todefine the energy efficiency, we define that the energyconsumption of TEES is η of the traditional encryptedsearch strategies (energy consumption of ORS over thatof TRS). η is described by Equation (1), where Ecodenotes the energy consumption per request/responsebetween the user and the server; Eretr stands for theenergy consumption during a file retrieval and Ew de-notes the energy consumption of a mobile device for therelevance score calculations.

Data OwnerFiles

Encrypted Index

Encrypted Files

Storage Nodes Cloud ServerOutsourcing

Outsourcing

Authentication

1.Encrypted Keyw

ords

3.Relevant Files

2.Ranking Function

Fig. 5. Encrypted Search Architecture of TEES

η =Eco + Eretr

2Eco + Eretr + Ew< 100% (1)

Since relevance score calculation is offloaded to thecloud due to the ORS design of TEES (Figure 4), Ew iseliminated. ORS compacts the file search and retrievalprocess into a singe round (only one Eco). Eretr isidentical for both TRS and ORS. Note that η is smallerthan 1, and power consumption can be reduced byTEES. Moreover, since Ew is very large, we can predictthat the energy consumption of the mobile device issignificantly reduced by TEES, which is clearly reflectedin our experiments.

3.3.2 Reducing File Search and Retrieval Time

Reducing the execution time of a file search and retrievaltransaction is important for the user experience. Notethat beyond the “one round-trip-time” saved by TEES,the computational time is also reduced by the offloadingof the computational workload from the user side to thecloud side.

Assume that the computing ability of the cloud serverand of the mobile clients are denoted by Ccs and Cm(Cm � Ccs), respectively. Then, the searching workloadis Σ, and Tretr denotes the file retrieving time, while RTTrepresents one round trip time. The reduction ratio of thefile search to the retrieval time, ρ, is given by Equation(2), which is smaller than 1 and proves the computingefficiency by offloading of TEES.

ρ =RTT + Σ

Ccs+ Tretr

2RTT + ΣCm

+ Tretr< 100%. (2)

3.3.3 Reducing Traffic Overhead

It is vital to minimize the communication overheadin order to ensure that it does not cancel any otherperformance improvement. In TEES, there is only oneround-trip communication for each keyword search, andthe selected index is not transferred between the cloudand the user as depicted in the TRS case (Figure 3). Thisattribute significantly reduces any possible communica-tion overload.




In fact, only two steps require communications. Dur-ing the authentication process, the data user sends hisidentity information to the data owner, then the keys andhash table are sent back to him. It is nearly the same inboth TRS and ORS. During the file retrieval process, theauthenticated data user sends the encrypted keywordsto the cloud server and gets top-k ranked files back. LetCkw be the size of the keywords, Crt be the networktraffic of one communication between the cloud and thedata user, Cf be the total size of the top-k files, Cpr bethe selected index transmitted in TRS.

We can predict that the communication overhead ofORS is reduced by ζ compared to TRS. ζ can be calcu-lated using Equation (3), which is smaller than 1 andproves the reduced traffic amount of TEES.

ζ =Ckw + Crt + Cf

Ckw + 2Crt + Cpr + Cf< 100% (3)

Overall, the modification of the file search and re-trieval processes in TEES design can lead to a more effi-cient traffic and a decreased energy consumption. TEESmust also redesign and implement the correspondingmodules in order to match the simplified ORS such thatthe security level does not decrease. Next section willintroduce the detailed implementation technologies inTEES.

4 TEES IMPLEMENTATION FOR SECURITYENHANCEMENT FOR MOBILE CLOUD

In order to achieve security enhancement with energyand traffic efficiency, we implement the modules inTEES using modified routines and new algorithms. Oursystem will be introduced in three parts. As previouslymentioned, the data owner should build a TF table asindex and encrypt it using OPE in order to offload thecalculation and ranking load of the relevance scores tothe cloud. So as to control the statistics information leak,we implement our one-to-many OPE in the data ownermodule (Section 4.1). We also wrap the keywords to besearched by adding some noise in the data user moduleto help controlling the keywords-files association leak.In order to get top-k relevant files, we implement aranking function to calculate the relevant score on thecloud (Section 4.2). Given a keyword in ORS, the cloudserver is in charge of calculating the relevance scoresfor the data user to get the corresponding top-k relevantfiles. Therefore, we implement both the unwrap andrank functions in the cloud server module (Section 4.3).Hence these modules are modified compared with thetraditional ones.

4.1 Redesign of the Data Owner ModuleWe modify the way of building the index to supportthe ORS scheme by our one-to-many OPE and imple-ment it to control the statistics information leak. Theauthentication between the data owner and the datauser is also redesigned in order to ensure the security

of TEES. We now elaborate the implementation of theindex construction, the encryption functions and detailthe authentication process.

TF-IDF: TF-IDF [46] is the product of two statistics,term frequency and inverse document frequency. Variousways for determining the exact values of both statisticsexist. In the case of the term frequency (TF) tf(t, d), thesimplest choice is to use the raw frequency of a term ina document, i.e. the number of times that term t occursin document d. If we denote the raw frequency of t byf(t, d), then the simple TF scheme is tf(t, d) = f(t, d).

The inverse document frequency (IDF) is a measureof whether the term t is common or rare across alldocuments. It is obtained by dividing the total numberof documents by the number of documents containingthe term, and then taking the logarithm of that quotient.Then TF-IDF is calculated as:

tf–idf(t, d,D) = tf(t, d)× idf(t,D), (4)

where D denotes the total number of documents inthe data set. A high weight in TF-IDF is reached bya high term frequency (in the given document) anda low document frequency of the term in the wholecollection of documents; the weights thus tend to filterout common terms. Since the ratio inside the IDF’s logfunction is always greater than or equal to 1, the valueof IDF (and TF-IDF) is greater than or equal to 0. Asa term appears in more documents, the ratio inside thelogarithm approaches 1, bringing the IDF and TF-IDFcloser to 0.

Build Index: In TEES, the data owner starts by collect-ing the files he wants to store into the cloud. Considera file set F= (F1, F2, ..., Fn) containing the number of|F| files, in which a term set T = (t1, t2, ..., tm) and thenumber of |T | terms appear. We create a table of size|F|×|T | for all the files and all the terms, where the valueat the ith row and jth column denotes the number ofoccurrences of the ith term in the jth file. This numericalvalue is the TF value. Then a constant S is chosen as acofactor to standardize these occurrences with the sizeof the files. In TEES, we use this TF table as our indexand the cloud server calculates the relevance scores usingthe encrypted TF values. The ranking function of therelevance scores will be introduced in the subsectiondedicated to the cloud server.

Let GenKey() be the function that generates the keys.Let π() be a hash function that encrypts the terms. InTEES, π() is instantiated by a hash function such as MD5.Let ψ() be the hash of the encrypted terms, ψ(ti) be theentry of the term ti in the index. Let ε() be an encryptionalgorithm for the TF values. When building an index, itexecutes following two steps:

1) First, the data owner starts by calling GenKey(),to generate a key α to encrypt the terms, a key β toencrypt the index, and the noise λ > 0, µ > 0 to wrap thekeywords. He then outputs K = {α, β} and N = {λ, µ}.

2) Second, the data owner builds a secure index by




calling BuildIndex(K,F) (encrypted N is sent to cloudserver) as described in Algorithm 1:

Algorithm 1 BuildIndexInput: K,FOutput: I

1: Extract the terms T = (t1, t2, ..., tm) from the file set F .2: for ti ∈ T do3: Get the encrypted term πα(ti) and hash it to get its entry

ψ(πα(ti)) in the TF table.4: end for5: for ti ∈ T and 1 ≤ j ≤ |F| do6: Calculate the term frequency tfij and get tf ij =

|(S/|Fi|)× tfij |.7: end for8: Compute εβ(tf ij), and store it in the index I.9: return I;

Encrypt Function: An aforementioned OPE approachemploying a one-to-one mapping [18] can be used toεβ(). However such an OPE strategy cannot be directlyemployed to secure the TF values in consideration ofsecurity issues in MCS. In Figure 7 (a) we show thehistogram of the TF values for one of our data sets.Note how sharp the TF histogram is, meaning that anattacker can collect statistical informations from the TFtable if he retrieves it from the cloud server. In orderto flatten the distribution, we use a one-to-many OPE,i.e., we map every TF value to a random number in acertain range. For example, a TF value tf will be mappedto a range [tfx, tfy], where 0 ≤ tfx ≤ tfy < 2B are thelower and upper bound of the random mapping range(B is set to 16 in our performance evaluation and to 8in our security evaluation). For two adjacent TF valuestf1 and tf2, the random mapping ranges [tfx1 , tf

y1 ] and

[tfx2 , tfy2 ] are chosen to be non-overlapping but close to

one another, that is if tf1 < tf2 then tfy1 < tfx2 . Therefore,the Order Preserving Encryption (OPE) is maintained.

In order to increase the entropy of the ciphertext, weadaptively generate the mapping range according to thedistribution of the raw TF values and encrypt themas described in Algorithms 2 and 3. Here G(TF ) ={G(tf1), ..., G(tfi)} is a lower bound set for all the TFvalues in a certain TF table, while H(TF ) is the upperbound set.

Our algorithm is order-preserving since a given TFvalue tf1 is encrypted to an integer E(tf1) which is lessthan tfy1 , and tf2 = tf1 + 1 is mapped into a new rangewhose lower boundary is larger than tfy1 . Our one-to-many OPE flattens the TF values histogram over a fileset so as to ensure the system security, which will bediscussed and evaluated in Section 5.

OPE is a secure algorithm for encryption which isproved by Boldyreva et al. [34], so that our OPE algo-rithm is secure enough for daily use. Meanwhile, our al-gorithm is a simple implementation of order-preservingencryption which will consume less energy than othercomplicated ones. And if we are desiring to updateour files or index with our mobile device, this energyefficiency algorithm will become a good choice for data

Algorithm 2 Key GenerationInput: TF TableOutput: G(TF ), H(TF )

1: Get the distribution histogram Ψ of the TF table and getTFx as all TF values occur in Ψ.

2: for tfi ∈ TFx do3: Get the occurrence ci.4: end for5: Get C =

∑|TFx|i=1 ci.

6: for tfi ∈ TFx do7: Calculate pi = ci/C.8: end for9: for tfi ∈ TFx do

10: if i==1 then11: Get G(tfi)=1 and H(tfi) = floor(2B × pi).12: else13: Get G(tfi)=H(tfi−1) + 1 and H(tfi) = H(tfi−1) +

floor(2B × pi).14: end if15: end for16: return G(TF ), H(TF ).

Algorithm 3 Order Preserving EncryptionInput: tfOutput: E(tf)

1: for ti ∈ T and 1 ≤ j ≤ |F| do2: Get E(tfij), E(tfij)

R← {G(tfij), G(tfij)+1, ..., H(tfij)}.3: end for4: return E(tf).

owners. The data owner can also use AES (AdvancedEncryption Standard) to encrypt the files.

Authentication: In TEES, the data owner maintains aset of legal users (“legal set”) and a set of users that willbecome invalid in after a defined delay (“overdue set”).The process of authentication is shown on Figure 6.

Data Owner Data User

Send

identity

infomation

Send back

hash table

and keys

Fig. 6. Process of Authentication

When a user intends to access the file, he first sendshis information to be authenticated by the data owner. Inour design, we use our unified school authentication inTEES and transfer it through https for safety concern.The data owner sends the keys along with the hashtable back if the user belongs to the legal set. Thishash table will be used in the hash process in Figure 6.Then the data owner records the ”International MobileEquipment Identity” of the user’s mobile device andstores its encrypted version into the cloud. When theuser’s authority is overdue, his identity information ismoved to the “overdue” set. The data owner will alsonotify the cloud of the changes. Note that the data ownershould regularly update the hash table and the keys such




that only users in the “legal set” will be notified. At thesame time, this authentication process needs the dataowner be online, but mature notification methods canbe involved to push the authentication requests to theofflined data owner.

4.2 Redesign of the Data User Module

The data user module is executed on the mobile clientsside. The wrap function of the keywords is implementedto solve the keywords-files association leak. In the wrapfunction, the stem, the encryption and the hash operationare exactly the same as in the index building algorithm.The function decrypting the files corresponds to theencryption done by the data owner. The authenticationfunction is used for authentication. We now detail thewrap function of this module.

Wrap Function with Noise: When an authorized datauser wants to retrieve files, he needs to encrypt thecorresponding query keyword w, and get the hash valueh from the hash table. This hash value is then sent to thecloud server and used to compute the relevance scores.In order to render this hash value indistinguishable foran attacker, the cloud client should wrap it, adding somenoise before sending it to the cloud server. The wrapfunction Wrap() will, first of all create a random numberr, and then build a tuple (h1, h2) based on Algorithm 4(λ > 0, µ > 0):

Algorithm 4 Wrap Function with NoiseInput: w, λ, µOutput: Wrap(w)

1: Stem w and get w.2: Get encrypted term πα(w) and hash it to get an entry h =ψ(πα(w)).

3: Create a random number r.4: if r < h then5: Get Wrap(w) = ((λh− µr)2, λh2 + µr).6: else7: Wrap(w) = (µh2 + λr, (λr − µh)2).8: end if9: return Wrap(w).

Given a keyword w, the data user searches Wrap(w) =(h1, h2). This largely enhances the security of the key-words search which will be detailed in Section 5.

4.3 Redesign of the Cloud Server Module

We will describe the functions that unwraps the key-words and rank the relevance scores for the cloud servermodule. These functions are used to get the top-k relevantfiles according to a given search keyword.

Unwrap Function: Note that the cloud server is semi-trusted, and the unwrap function can be processed bythe server. Upon receiving the tuple Wrap(w) = (h1, h2),the server calls Unwrap((h1, h2)) to get ψ(πα(w)) = h,searches into the TF table, and then sends back thecorresponding files. The equation is:

{h2 + h−

√h1+h2

λ = 0 if h1 < h2

h2 + h− h1−√h2

µ = 0 if h1 ≥ h2

(5)

Assuming that the random number (noise) createdby the wrap function is positive, the unwrap functionbehave as expected. Since h is a positive integer, wecould recover h using Unwrap():

h =

√

1+4(√h1+h2)

λ −1

2 (h1 < h2)√1+

4(h1−√h2)

µ −1

2 (h1 ≥ h2)

(6)

Ranking Function: Cloud server calculates the rele-vance scores and return top-k relevant files accordingto the searching query from data user. The calculationscheme in [31] is used in our scheme. Note that due tothe order preserving index, any other relevance scorescalculation method [31], [32], [33], [47] can also be em-ployed. TEES calculates the relevance score as Equation(7):

Score(Ws,Fc) =∑w∈Ws

1

|Fc|× (1 + ln fc,w)× ln(1 +

D

fw)

(7)Here Ws is the keyword set to be searched; Fc is a

certain file in the file set; fc,w denotes the TF of thekeyword w in the file Fc; |Fc| is the total length of Fc;fw is the number of files containing the keyword w andD is the total number of files.

When performing a single keyword search, the IDFfactor in Equation (7) is constant. Thus, we simplify theequation as follows:

Score(w,Fc) =1

|Fc|× (1 + ln fc,w) (8)

The cloud server sends back the top-k relevant filesafter ranking the scores using this relevance score calcu-lation algorithm, as depicted in Algorithm 5.

Algorithm 5 Top-k Ranking FunctionInput: w, kOutput: topF iles

if this request is sent by a ”legal” user thenfor each file Fc ∈ F do

Calculate Score(w,Fc).end for

end ifif this request is sent by a overdue user then

for each file Fc ∈ F doCalculate Score(w,Fc) but with a warning.

end forelse

Return ”No Permission”.end ifRank the scores to get top-k files topF iles ={topF1, topF2, ..., topFk}.return topF iles.




TEES can also support other document modelingmethod such as Latent Dirichlet allocation [19]. We canalso find the probability for a term to appear in a fileusing a middle tier “topic”, and then store its encryptedvalue in the index.

Note that TEES employs single-keyword search, butthe basic ideas, such as design efficiency by offloadingand security enhancement by adding noise, can be ex-tended to all other encrypted search schemes. Moreover,it is known that multiple-keywords search can providemore accurate the search results, but makes the searchmore complicated at the same time (as discussed in Sec-tion 2.2). In mobile cloud, the single-keyword is enoughto distinguish the documents that users need since ourdocuments are classified clearly. Moreover, if we searchencrypted data with multi-keywords, we should sacrificethe search accuracy because most popular OPE (OrderPreserving Encryption) does not support multi-keywordwell. Most OPE could guarantee the order betweentwo values, but when these two values multiplied byseveral factors and then get the arithmetic sum (asmulti-keyword search process), we cannot insure theorder between the arithmetic sums. Another importantrespect is that, decreasing the number of keywords coulddecrease the consumption of energy for mobile device.Overall, developing a single keyword search schemeis a proper solution of encrypted search data sharingfor mobile cloud storage. Moreover, TEES is a generalarchitecture, where the OPE method proposed here canbe substituted by other novel schemes.

5 SECURITY ANALYSIS AND EVALUATION

In this section, we analyze the security of TEES basedon important security threats mentioned in Section 2.2.Concretely, the most important principle of the designis to prevent the attacker from obtaining any plaintextinformation regarding our data file set or the searchedkeyword. Then we should let the trusted but curious mo-bile cloud server learn as little information as possible.Last but not least, an unauthenticated data users shouldnot be able to perform any file retrieval. We ran experi-ments to test the security of TEES. Our experiments aredivided into three parts as mentioned in Section 2.

5.1 Statistics Information Leak Control

TEES protects the terms from being determined by ana-lyzing the distribution of the TF values through mobilecloud communication channels. Figure 7(a) shows thehistogram of the TF values over a data set, and we cansee that the TF histogram are very sharp originally. Itmeans that an attacker may get statistical informationsfrom the TF table as previously explained. In TEES,we encrypt the TF table with one-to-many order pre-serving encryption. Every TF value is mapped to amapping range. As shown in Figure 7(b), we indeedobtain an approximately uniform distribution once theorder preserving encryption has been applied, and every

0 5 10 15 20 250

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

Term Frequency Value

Num

ber o

f Occ

uren

ces

(a) Original Distribution

0 50 100 150 200 250 3000

50

100

150

200

250

Term Frequency Value

Num

ber o

f Occ

uren

ces

(b) One-to-many OPE

Fig. 7. TF Distribution

term occurrence frequency is significantly reduced. Incomparing Figure 7(a) and (b), the OPE of TEES reducesthe Y-axis range from 37K to 250, and enlarge the X-axis from about 25 to 250. This means that even if anattacker accesses the TF table, it is hard to gain any extrainformation on our data set in a short time.

5.2 Keywords-files Association Leak Control

TEES also enhances the security of the keywords bypreventing the attacker from observing the keywordsto be searched as well as the results of a search. Oursolution is to wrap the words to be queried into thewrapping function proposed in Section 4.

When an authorized user searches files from a mobileclient, he gets a hash value ψ(πα(w)) corresponding tothe keyword w to be queried. Then this hash value isembedded into a tuple (h1, h2) and sent to the server.Even if the attacker knows the content to be queried,he does not know anything about the keywords to bequeried because of the noise (depicted in Section 4.2),so he is also unable to determine the terms. The tuple(h1, h2) corresponding to the word “Bund” of our dataset in 200 retrievals is distributed as displayed on Figure8 after being wrapped. For example if λ = 1 and µ = 1(the definitions of λ and µ have been also provided inSection 4.2), then the distribution is as in Figure 8 (a). If




(a) λ=1, µ=1 (b) λ=2, µ=3

(c) λ=1, µ=1 (curve fitting) (d) λ=2, µ=3 (curve fitting)

Fig. 8. The Distribution of the Term “Bund”

we change λ and µ to λ = 2 and µ = 3, the distributionchanges as shown in Figure 8 (b). It is then hard forattackers to determine the keywords. It is also hard tofit a curve as shown on Figure 8 (c) and (d). In addition,the data owner can also regularly update λ and µ inorder to enhance the security.

5.3 Server Information Acquisition ControlWe assume that the cloud storage system provider willnot collude with malicious users or intrude users’ dataintentionally. The cloud server can infer and analyze theencrypted index and get additional information, but ithas no intention to modify any important data. We usethe private cloud server from our school and assumed itas honest and perform important calculations here. Thisassumption is also used in most of the previous work[13].

In TEES, the cloud server calculates the relevancescore and finds the most relevant files correspondingto a given keyword. In order to find the TF value ofa queried keyword, the cloud server should also knowthe unwrap function of the user-supplied wrapped tuple.Therefore, the cloud server gets more information thanany potential attacker. Therefore, a curious cloud server

may determine the terms queried by the users only bycomparing the queries and the results. As most of theprevious schemes, we assumed that our test cloud serversemi-trusted, and therefore we only need to minimizethe amount of information it acquires. Moreover in termsof performance improvements, information leakage doesnot seem to be a very serious problem, and updating theTF table periodically also protects the index from beinginferred by the server.

Note that TEES is established with widely used TF-IDF encryption approach, and all nature defects of thisencrypted search scheme cannot be completely resolvedeven TEES In addition, when a data user performs asearch, the keyword to be queried will be encrypted bythe data owner’s key. The user receives a hash tableafter the first time it is authorized by the data owner.In order to prevent these users from running secretsearches maliciously, we also periodically change the keyas detailed in Section 4. Thus, TEES also tolerates userchurn.

6 RUNTIME PERFORMANCE EVALUATION

In addition to the system security analysis and eval-uation, we now evaluate TEES performance in terms




of energy, traffic and file access. We will compare itsperformances to those of TRS and PTS schemes.

6.1 Experimental EnvironmentIn our experiments, we use a data set of 1000 files withdifferent sizes and a VM in the cloud with Dual vCPUs at2.27GHz. An android smart phone with a CPU at 1GHzsends the queries as the mobile client of TEES throughan about 8M wireless network.

An Android program receives the user’s input andencrypts it before getting the hash value and then wrapit into a tuple which is sent to the mobile cloud server.Another feature of this program is to retrieve the filesback from the mobile cloud server and decrypt them.In addition, we implemented both TRS and PTS for acomparative purpose.

6.2 Energy ConsumptionAs energy consumption is critical for mobile devices, weevaluate TEES energy efficiency in this subsection. Weuse Battor [48], a phone power monitor to accuratelymeasure the system energy consumption.

The energy consumption of TRS and ORS is shown inFigure 9 (a) and (b). Although slight changes dependingupon the environment might occur, the comparison isquite accurate as controlled trials were performed.

Observe that the energy consumption is reduced from0.08mAh to 0.036mAh when searching and retrievingfiles of size 100KB, which means that ORS saves 55%energy compared to TRS. When searching and retrievingfiles of 1MB size, the energy consumption is reducedfrom 0.164mAh to 0.106mAh, that means a 35% energysaving. So, TEES provides a very efficient power con-sumption. For example, to exhaust our 1650mAh battery,ORS (of TEES) can perform ∼22000 retrievals while TRScould only retrieve ∼13000 files of size 600KB.

6.3 File Search and Retrieval Time

100 200 300 400 500 600 700 800 900 10000

200

400

600

800

1000

1200

1400

1600

1800

2000

File size (KB)

Tim

e (m

s)

PTSTRSORS

Fig. 10. FSRT of PTS, TRS and ORS

We compare the File Searching and Retrieval Time(FSRT) for the three schemes in this subsection as illus-trated in Figure 10. We test the FSRT for different fileswith size ranging from 100KB to 1MB. We observe thatthe FSRT of PTS is the shortest since it does not have to

perform any security computation. The FSRT of ORS iseffectively reduced when compared to the one of TRS.This difference is due to the advantages of the TEESdesign in terms of relevance score calculation offloading,and thus leads to reduction of file search and retrievalprocess. The FSRT value of ORS is very near to the one ofPTS, implying a very low cost to security on the mobiledevice. For example, TEES saves FSRT by 46% comparedto TRS for files of size 100KB, and by 23% for 1MB filesas shown in Figure 10.

The file retrieval time only depends on the file size andnetwork bandwidth. When offered a greater bandwidth,TEES becomes more efficient since downloading timeof files becomes a bottleneck of other schemes. Thedecryption time of the files is equal in all schemes andit is therefore pointless to measure it.

TABLE 1FSRT analyse of PTS, TRS and ORS

PTS TRS ORS

Request/Response 190ms 370ms 190msStemming and Encryption 0 10ms 10msHash and Wrap 0 145ms 150msServer file search 80ms 70ms 75msClient file search 0 260ms 0

Sum 270ms 855ms 425ms

The efficient FSRT of TEES is achieved by improv-ing the process efficiency, since only a single round ofcommunication and relevance score calculation offloadare used. The searching process is analysed in Table 1.Without any security service, PTS (Plain Text search)does not spend any time on stemming and encryption;neither does it on hash and wrap. On the other hand,ORS and TRS provide encrypted search schemes withrelated overhead. As shown in Table 1, ORS can improvethe “request/response” time significantly than TRS from370ms to 190ms (saving 180ms), and eliminate the “clientfile search” time by offloading it onto the server (saving260ms). Notice that the “sever file search” calculationworkload of ORS is 75ms, which is 5ms longer than thatof TRS. This is explained by the fact that the server takesthe offloaded search calculation of the mobile user. In theother words, TEES eliminates the “client file search” timeat the cost of a little heavier “server file search” time.This proves that the offloading is highly efficient (5ms vs.260ms). Moreover, ORS spends more 5ms on wrappingthe hash value than TRS for enhancing the security. Notethat the “server file search” time of PTS is higher thanthe other two schemes, since the server should executestem and hash function for plaintext file search, while thehash functions are executed by the mobile data user inboth TRS and ORS. Overall, ORS is secure and effective.

6.4 ThroughputThe calculation offload from the mobile device to thecloud data center reduces the execution time of the rele-




100 200 300 400 500 600 700 800 900 10000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

File Size (KB)

Ene

rgy

Co

nsu

mp

tion

(m

Ah

)

(a) Scheme TRS

100 200 300 400 500 600 700 800 900 10000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

File Size (KB)

Ene

rgy

Co

nsu

mp

tion

(m

Ah

)

(b) Scheme ORS

Fig. 9. Energy Consumption

vance score calculation due to the higher server capacity.Therefore, this also greatly increases the system through-put besides FSRT improvement; Figure 11 highlightsthe difference of file throughput of TRS and ORS. Weobserve that the file access acceleration is very effectivewhen dealing with small files as the relevance scorecalculation is executed more frequently. For example,on a 100KB file, the access speed is increased from104KB/s to 194KB/s, almost doubling the throughput.The acceleration is still effective when accessing fileswith size 1MB (29.6% acceleration). The throughput ofORS is not much less than that of PTS.

100 200 300 400 500 600 700 800 900 10000

100

200

300

400

500

600

700

800

File size (KB)

Thr

ough

tput

(K

B/s

)

PTSTRSORS

Fig. 11. The Throughput of PTS, TRS and ORS

7 CONCLUSION AND FUTURE WORK

In this paper, we developed a new architecture, TEES asan initial attempt to create a traffic and energy efficientencrypted keyword search tool over mobile cloud stor-ages. We started with the introduction of a basic schemethat we compared to previous encrypted search toolsfor cloud computing and showed their inefficiency in amobile cloud context. Then we developed an efficient

implementation to achieve an encrypted search in amobile cloud.

The security study of TEES showed that it is secureenough for mobile cloud computing, while a series ofexperiments highlighted its efficiency. TEES is slightlymore time and energy consuming than keyword searchover plain-text, but at the same time it saves significantenergy compared to traditional strategies featuring asimilar security level. Based on TEES, this work can beextended to more other novel implementations.

We have proposed a single keyword search schemeto make encrypted data search efficient. However, thereare still some possible extensions of our current workremaining. We would like to propose a multi-keywordsearch scheme to perform encrypted data search overmobile cloud in future. As our OPE algorithm is a simpleone, another extension is to find a powerful algorithmwhich will not harm the efficiency.

REFERENCES[1] L. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner, “A

break in the clouds: towards a cloud definition,” ACM SIGCOMMComputer Communication Review, vol. 39, no. 1, pp. 50–55, 2008.

[2] X. Yu and Q. Wen, “Design of security solution to mobile cloudstorage,” in Knowledge Discovery and Data Mining. Springer, 2012,pp. 255–263.

[3] D. Huang, “Mobile cloud computing,” IEEE COMSOC MultimediaCommunications Technical Committee (MMTC) E-Letter, 2011.

[4] O. Mazhelis, G. Fazekas, and P. Tyrvainen, “Impact of storageacquisition intervals on the cost-efficiency of the private vs. publicstorage,” in Cloud Computing (CLOUD), 2012 IEEE 5th InternationalConference on. IEEE, 2012, pp. 646–653.

[5] J. Oberheide, K. Veeraraghavan, E. Cooke, J. Flinn, and F. Jaha-nian, “Virtualized in-cloud security services for mobile devices,”in Proceedings of the First Workshop on Virtualization in MobileComputing. ACM, 2008, pp. 31–35.

[6] J. Oberheide and F. Jahanian, “When mobile is harder thanfixed (and vice versa): demystifying security challenges in mobileenvironments,” in Proceedings of the Eleventh Workshop on MobileComputing Systems & Applications. ACM, 2010, pp. 43–48.

[7] A. A. Moffat, T. C. Bell et al., Managing gigabytes: compressing andindexing documents and images. Morgan Kaufmann Pub, 1999.

[8] D. Song, D. Wagner, and A. Perrig, “Practical techniques forsearches on encrypted data,” in Security and Privacy, 2000. S&P2000. Proceedings. 2000 IEEE Symposium on. IEEE, 2000, pp. 44–55.




[9] D. Boneh, G. Di Crescenzo, R. Ostrovsky, and G. Persiano, “Publickey encryption with keyword search,” in Advances in Cryptology-Eurocrypt 2004. Springer, 2004, pp. 506–522.

[10] R. Curtmola, J. Garay, S. Kamara, and R. Ostrovsky, “Searchablesymmetric encryption: improved definitions and efficient con-structions,” in Proceedings of the 13th ACM conference on Computerand communications security. ACM, 2006, pp. 79–88.

[11] Y. Chang and M. Mitzenmacher, “Privacy preserving keywordsearches on remote encrypted data,” in Applied Cryptography andNetwork Security. Springer, 2005, pp. 391–421.

[12] S. Zerr, D. Olmedilla, W. Nejdl, and W. Siberski, “Zerber+ r: Top-k retrieval from a confidential index,” in Proceedings of the 12thInternational Conference on Extending Database Technology: Advancesin Database Technology. ACM, 2009, pp. 439–449.

[13] C. Wang, N. Cao, K. Ren, and W. Lou, “Enabling secure andefficient ranked keyword search over outsourced cloud data,”Parallel and Distributed Systems, IEEE Transactions on, vol. 23, no. 8,pp. 1467–1479, 2012.

[14] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, “Secure ranked key-word search over encrypted cloud data,” in Distributed ComputingSystems (ICDCS), 2010 IEEE 30th International Conference on. IEEE,2010, pp. 253–262.

[15] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, “Privacy-preservingmulti-keyword ranked search over encrypted cloud data,” Paralleland Distributed Systems, IEEE Transactions on, vol. 25, no. 1, pp.222–233, 2014.

[16] B. Wang, S. Yu, W. Lou, and Y. T. Hou, “Privacy-preservingmulti-keyword fuzzy search over encrypted data in the cloud,”in INFOCOM, 2014 Proceedings IEEE.

[17] J. Zobel and A. Moffat, “Inverted files for text search engines,”ACM Computing Surveys (CSUR), vol. 38, no. 2, p. 6, 2006.

[18] R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu, “Order preservingencryption for numeric data,” in Proceedings of the 2004 ACMSIGMOD international conference on Management of data. ACM,2004, pp. 563–574.

[19] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,”the Journal of machine Learning research, vol. 3, pp. 993–1022, 2003.

[20] J. Ramos, “Using tf-idf to determine word relevance in documentqueries,” Technical report,Department of Computer Science, RutgersUniversity, 2003.

[21] D. Hiemstra, “A probabilistic justification for using tf× idf termweighting in information retrieval,” International Journal on DigitalLibraries, vol. 3, no. 2, pp. 131–139, 2000.

[22] K. Jones, “Index term weighting,” Information storage and retrieval,vol. 9, no. 11, pp. 619–633, 1973.

[23] Q. Chai and G. Gong, “Verifiable symmetric searchable encryptionfor semi-honest-but-curious cloud servers,” in Communications(ICC), 2012 IEEE International Conference on. IEEE, 2012, pp. 917–922.

[24] S. Kamara and K. Lauter, “Cryptographic cloud storage,” inFinancial Cryptography and Data Security. Springer, 2010, pp. 136–149.

[25] M. Li, S. Yu, K. Ren, W. Lou, and Y. T. Hou, “Toward privacy-assured and searchable cloud data storage services,” Network,IEEE, vol. 27, no. 4, pp. 56–62, 2013.

[26] W. Sun, B. Wang, N. Cao, M. Li, W. Lou, Y. T. Hou, and H. Li,“Privacy-preserving multi-keyword text search in the cloud sup-porting similarity-based ranking,” in Proceedings of the 8th ACMSIGSAC Symposium on Information, Computer and CommunicationsSecurity, ser. ASIA CCS ’13. New York, NY, USA: ACM, 2013,pp. 71–82.

[27] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, “Privacy-preservingmulti-keyword ranked search over encrypted cloud data,” inINFOCOM, 2011 Proceedings IEEE. IEEE, 2011, pp. 829–837.

[28] S. Hou, T. Uehara, S. Yiu, L. C. Hui, and K. Chow, “Privacypreserving multiple keyword search for confidential investigationof remote forensics,” in Multimedia Information Networking andSecurity (MINES), 2011 Third International Conference on. IEEE,2011, pp. 595–599.

[29] P. Golle, J. Staddon, and B. Waters, “Secure conjunctive keywordsearch over encrypted data,” in Applied Cryptography and NetworkSecurity. Springer, 2004, pp. 31–45.

[30] A. Aizawa, “An information-theoretic perspective of tf-idf mea-sures,” Information Processing and Management, vol. 39, pp. 45–65,2003.

[31] G. Salton and M. J. McGill, “Introduction to modern informationretrieval,” McGraw-Hill, Inc.,New York, NY, 1986.

[32] E. Han and G. Karypis, “Centroid-based document classification:Analysis and experimental results,” Principles of Data Mining andKnowledge Discovery, pp. 116–123, 2000.

[33] L. Baker and A. McCallum, “Distributional clustering of wordsfor text classification,” in Proceedings of the 21st annual internationalACM SIGIR conference on Research and development in informationretrieval. ACM, 1998, pp. 96–103.

[34] A. Boldyreva, N. Chenette, Y. Lee, and A. O´ neill, “Order-preserving symmetric encryption,” Advances in Cryptology-EUROCRYPT 2009, pp. 224–241, 2009.

[35] M. Van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan,“Fully homomorphic encryption over the integers,” Advances inCryptology–EUROCRYPT 2010, pp. 24–43, 2010.

[36] C. Gentry and S. Halevi, “Implementing gentrys fully-homomorphic encryption scheme,” Advances in Cryptology–EUROCRYPT 2011, pp. 129–148, 2011.

[37] A. Swaminathan, Y. Mao, G. Su, H. Gou, A. Varna, S. He, M. Wu,and D. Oard, “Confidentiality-preserving rank-ordered search,”in Proceedings of the 2007 ACM workshop on Storage security andsurvivability. ACM, 2007, pp. 7–12.

[38] C. Orencik and E. Savas, “Efficient and secure ranked multi-keyword search on encrypted cloud data,” in Proceedings of the2012 Joint EDBT/ICDT Workshops. ACM, 2012, pp. 186–195.

[39] K. Bowers, A. Juels, and A. Oprea, “Hail: a high-availability andintegrity layer for cloud storage,” in Proceedings of the 16th ACMconference on Computer and communications security. ACM, 2009,pp. 187–198.

[40] J. Zhang, B. Deng, and X. Li, “Additive order preserving encryp-tion based encrypted documents ranking in secure cloud storage,”Advances in Swarm Intelligence, pp. 58–65, 2012.

[41] C. Gentry, “A fully homomorphic encryption scheme,” Ph.D.dissertation, Stanford University, 2009.

[42] D. Stehle and R. Steinfeld, “Faster fully homomorphic encryp-tion,” Advances in Cryptology-ASIACRYPT 2010, pp. 377–394, 2010.

[43] K. Kumar and Y. Lu, “Cloud computing for mobile users: Canoffloading computation save energy?” Computer, vol. 43, no. 4,pp. 51–56, 2010.

[44] A. Miettinen and J. Nurminen, “Energy efficiency of mobile clientsin cloud computing,” in Proceedings of the 2nd USENIX conferenceon Hot topics in cloud computing, 2010, pp. 21–28.

[45] A. Carroll and G. Heiser, “An analysis of power consumption ina smartphone,” in Proceedings of the 2010 USENIX conference onUSENIX annual technical conference. USENIX Association, 2010,pp. 271–284.

[46] Wikipedia, “http://en.wikipedia.org/wiki/tf-idf.”[47] J. Zobel and A. Moffat, “Exploring the similarity space,” in ACM

SIGIR Forum, vol. 32, no. 1. ACM, 1998, pp. 18–34.[48] A. Schulman, T. Schmid, P. Dutta, and N. Spring, “Demo: Phone

power monitoring with battor.” MobiCom, 2011.

Jian Li is an Associate Professor in the School of Software at SJTU.He obtained his Ph.D. in Computer Science from the Institute NationalPolytechnique de Lorraine (INPL) - Nancy, France in 2007. He Hisresearch interests include real-time scheduling theory, network protocoldesign and embedded system. He is a member of IEEE/ACM.

RuHui Ma received his Ph.D. degree in computer science from Shang-hai Jiao Tong University, China, in 2011. Now he works as a postdocat SJTU. His main research interests are in virtual machines, computerarchitecture and compiling.

HaiBing Guan received his Ph.D. degree in computer science from theTongji University (China), in 1999. He is currently a professor with theFaculty of Computer Science, Shanghai Jiao Tong University (SJTU),Shanghai, China. His research interests include computer architecture,compiling, virtualization and hardware/software co-design.

ieee transactions on cloud computing 1 …1croreprojects.com/basepapers/2017/tees an...

Documents