towards secure and dependable storage services in cloud computing

14
1 Towards Secure and Dependable Storage Services in Cloud Computing Cong Wang, Student Member, IEEE, Qian Wang, Student Member, IEEE, Kui Ren, Member, IEEE, Ning Cao, Student Member, IEEE, and Wenjing Lou, Senior Member, IEEE Abstract—Cloud storage enables users to remotely store their data and enjoy the on-demand high quality cloud applications without the burden of local hardware and software management. Though the benets are clear, such a service is also relinquishing users’ physical possession of their outsourced data, which inevitably poses new security risks towards the correctness of the data in cloud. In order to address this new problem and further achieve a secure and dependable cloud storage service, we propose in this paper a exible distributed storage integrity auditing mechanism, utilizing the homomorphic token and distributed erasure-coded data. The proposed design allows users to audit the cloud storage with very lightweight communication and computation cost. The auditing result not only ensures strong cloud storage correctness guarantee, but also simultaneously achieves fast data error localization, i.e., the identication of misbehaving server. Considering the cloud data are dynamic in nature, the proposed design further supports secure and efcient dynamic operations on outsourced data, including block modication, deletion, and append. Analysis shows the proposed scheme is highly efcient and resilient against Byzantine failure, malicious data modication attack, and even server colluding attacks. Index Terms—Data integrity, dependable distributed storage, error localization, data dynamics, Cloud Computing 1 I NTRODUCTION S EVERAL trends are opening up the era of Cloud Computing, which is an Internet-based development and use of computer technology. The ever cheaper and more powerful processors, together with the software as a service (SaaS) computing architecture, are transforming data centers into pools of computing service on a huge scale. The increasing network bandwidth and reliable yet flexible network connections make it even possible that users can now subscribe high quality services from data and software that reside solely on remote data centers. Moving data into the cloud offers great convenience to users since they don’t have to care about the com- plexities of direct hardware management. The pioneer of Cloud Computing vendors, Amazon Simple Stor- age Service (S3) and Amazon Elastic Compute Cloud (EC2) [2] are both well known examples. While these internet-based online services do provide huge amounts of storage space and customizable computing resources, this computing platform shift, however, is eliminating the responsibility of local machines for data maintenance at the same time. As a result, users are at the mercy of their cloud service providers for the availability and integrity of their data [3]. On the one hand, although the cloud infrastructures are much more powerful and reliable than personal computing devices, broad range Cong Wang, Qian Wang, and Kui Ren are with the Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616. E-mail: {cong,qian,kren}@ece.iit.edu. Ning Cao and Wenjing Lou are with the Department of Electrical and Computer Engineering, Worcester Polytechnic Institute, Worcester, MA 01609. E-mail: {ncao,wjlou}@ece.wpi.edu. A preliminary version [1] of this paper was presented at the 17th IEEE International Workshop on Quality of Service (IWQoS’09). of both internal and external threats for data integrity still exist. Examples of outages and data loss incidents of noteworthy cloud storage services appear from time to time [4]–[8]. On the other hand, since users may not retain a local copy of outsourced data, there exist various incentives for cloud service providers (CSP) to behave unfaithfully towards the cloud users regarding the status of their outsourced data. For example, to increase the profit margin by reducing cost, it is possible for CSP to discard rarely accessed data without being detected in a timely fashion [9]. Similarly, CSP may even attempt to hide data loss incidents so as to maintain a reputation [10]–[12]. Therefore, although outsourcing data into the cloud is economically attractive for the cost and complexity of long-term large-scale data storage, its lacking of offering strong assurance of data integrity and availability may impede its wide adoption by both enterprise and individual cloud users. In order to achieve the assurances of cloud data in- tegrity and availability and enforce the quality of cloud storage service, efficient methods that enable on-demand data correctness verification on behalf of cloud users have to be designed. However, the fact that users no longer have physical possession of data in the cloud prohibits the direct adoption of traditional cryptographic primitives for the purpose of data integrity protection. Hence, the verification of cloud storage correctness must be conducted without explicit knowledge of the whole data files [9]–[12]. Meanwhile, cloud storage is not just a third party data warehouse. The data stored in the cloud may not only be accessed but also be frequently updated by the users [13]–[15], including insertion, deletion, mod- ification, appending, etc. Thus, it is also imperative to support the integration of this dynamic feature into the Digital Object Indentifier 10.1109/TSC.2011.24 1939-1374/11/$26.00 © 2011 IEEE IEEE TRANSACTIONS ON SERVICES COMPUTING This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Upload: nithilla-sukumar

Post on 26-Oct-2014

64 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Towards Secure and Dependable Storage services in cloud computing

1

Towards Secure and Dependable StorageServices in Cloud Computing

Cong Wang, Student Member, IEEE, Qian Wang, Student Member, IEEE, Kui Ren, Member, IEEE,Ning Cao, Student Member, IEEE, and Wenjing Lou, Senior Member, IEEE

Abstract—Cloud storage enables users to remotely store their data and enjoy the on-demand high quality cloud applications withoutthe burden of local hardware and software management. Though the benefits are clear, such a service is also relinquishing users’physical possession of their outsourced data, which inevitably poses new security risks towards the correctness of the data in cloud.In order to address this new problem and further achieve a secure and dependable cloud storage service, we propose in this papera flexible distributed storage integrity auditing mechanism, utilizing the homomorphic token and distributed erasure-coded data. Theproposed design allows users to audit the cloud storage with very lightweight communication and computation cost. The auditing resultnot only ensures strong cloud storage correctness guarantee, but also simultaneously achieves fast data error localization, i.e., theidentification of misbehaving server. Considering the cloud data are dynamic in nature, the proposed design further supports secureand efficient dynamic operations on outsourced data, including block modification, deletion, and append. Analysis shows the proposedscheme is highly efficient and resilient against Byzantine failure, malicious data modification attack, and even server colluding attacks.

Index Terms—Data integrity, dependable distributed storage, error localization, data dynamics, Cloud Computing

1 INTRODUCTION

SEVERAL trends are opening up the era of CloudComputing, which is an Internet-based development

and use of computer technology. The ever cheaper andmore powerful processors, together with the software asa service (SaaS) computing architecture, are transformingdata centers into pools of computing service on a hugescale. The increasing network bandwidth and reliable yetflexible network connections make it even possible thatusers can now subscribe high quality services from dataand software that reside solely on remote data centers.

Moving data into the cloud offers great convenienceto users since they don’t have to care about the com-plexities of direct hardware management. The pioneerof Cloud Computing vendors, Amazon Simple Stor-age Service (S3) and Amazon Elastic Compute Cloud(EC2) [2] are both well known examples. While theseinternet-based online services do provide huge amountsof storage space and customizable computing resources,this computing platform shift, however, is eliminatingthe responsibility of local machines for data maintenanceat the same time. As a result, users are at the mercyof their cloud service providers for the availability andintegrity of their data [3]. On the one hand, althoughthe cloud infrastructures are much more powerful andreliable than personal computing devices, broad range

• Cong Wang, Qian Wang, and Kui Ren are with the Department ofElectrical and Computer Engineering, Illinois Institute of Technology,Chicago, IL 60616. E-mail: {cong,qian,kren}@ece.iit.edu.

• Ning Cao and Wenjing Lou are with the Department of Electrical andComputer Engineering, Worcester Polytechnic Institute, Worcester, MA01609. E-mail: {ncao,wjlou}@ece.wpi.edu.

A preliminary version [1] of this paper was presented at the 17th IEEEInternational Workshop on Quality of Service (IWQoS’09).

of both internal and external threats for data integritystill exist. Examples of outages and data loss incidentsof noteworthy cloud storage services appear from timeto time [4]–[8]. On the other hand, since users maynot retain a local copy of outsourced data, there existvarious incentives for cloud service providers (CSP) tobehave unfaithfully towards the cloud users regardingthe status of their outsourced data. For example, toincrease the profit margin by reducing cost, it is possiblefor CSP to discard rarely accessed data without beingdetected in a timely fashion [9]. Similarly, CSP may evenattempt to hide data loss incidents so as to maintaina reputation [10]–[12]. Therefore, although outsourcingdata into the cloud is economically attractive for the costand complexity of long-term large-scale data storage,its lacking of offering strong assurance of data integrityand availability may impede its wide adoption by bothenterprise and individual cloud users.

In order to achieve the assurances of cloud data in-tegrity and availability and enforce the quality of cloudstorage service, efficient methods that enable on-demanddata correctness verification on behalf of cloud usershave to be designed. However, the fact that users nolonger have physical possession of data in the cloudprohibits the direct adoption of traditional cryptographicprimitives for the purpose of data integrity protection.Hence, the verification of cloud storage correctness mustbe conducted without explicit knowledge of the wholedata files [9]–[12]. Meanwhile, cloud storage is not just athird party data warehouse. The data stored in the cloudmay not only be accessed but also be frequently updatedby the users [13]–[15], including insertion, deletion, mod-ification, appending, etc. Thus, it is also imperative tosupport the integration of this dynamic feature into the

Digital Object Indentifier 10.1109/TSC.2011.24 1939-1374/11/$26.00 © 2011 IEEE

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 2: Towards Secure and Dependable Storage services in cloud computing

2

cloud storage correctness assurance, which makes thesystem design even more challenging. Last but not theleast, the deployment of Cloud Computing is poweredby data centers running in a simultaneous, cooperatedand distributed manner [3]. It is more advantages forindividual users to store their data redundantly acrossmultiple physical servers so as to reduce the data in-tegrity and availability threats. Thus, distributed proto-cols for storage correctness assurance will be of mostimportance in achieving robust and secure cloud storagesystems. However, such important area remains to befully explored in the literature.

Recently, the importance of ensuring the remote dataintegrity has been highlighted by the following researchworks under different system and security models [9]–[19]. These techniques, while can be useful to ensurethe storage correctness without having users possessinglocal data, are all focusing on single server scenario.They may be useful for quality-of-service testing [20],but does not guarantee the data availability in caseof server failures. Although direct applying these tech-niques to distributed storage (multiple servers) could bestraightforward, the resulted storage verification over-head would be linear to the number of servers. Asan complementary approach, researchers have also pro-posed distributed protocols [20]–[22] for ensuring stor-age correctness across multiple servers or peers. How-ever, while providing efficient cross server storage verifi-cation and data availability insurance, these schemes areall focusing on static or archival data. As a result, theircapabilities of handling dynamic data remains unclear,which inevitably limits their full applicability in cloudstorage scenarios.

In this paper, we propose an effective and flexibledistributed storage verification scheme with explicit dy-namic data support to ensure the correctness and avail-ability of users’ data in the cloud. We rely on erasure-correcting code in the file distribution preparation to pro-vide redundancies and guarantee the data dependabilityagainst Byzantine servers [23], where a storage servermay fail in arbitrary ways. This construction drasticallyreduces the communication and storage overhead ascompared to the traditional replication-based file distri-bution techniques. By utilizing the homomorphic tokenwith distributed verification of erasure-coded data, ourscheme achieves the storage correctness insurance aswell as data error localization: whenever data corruptionhas been detected during the storage correctness verifica-tion, our scheme can almost guarantee the simultaneouslocalization of data errors, i.e., the identification of themisbehaving server(s). In order to strike a good balancebetween error resilience and data dynamics, we furtherexplore the algebraic property of our token computa-tion and erasure-coded data, and demonstrate how toefficiently support dynamic operation on data blocks,while maintaining the same level of storage correctnessassurance. In order to save the time, computation re-sources, and even the related online burden of users,

we also provide the extension of the proposed mainscheme to support third-party auditing, where users cansafely delegate the integrity checking tasks to third-partyauditors and be worry-free to use the cloud storageservices. Our work is among the first few ones in thisfield to consider distributed data storage security inCloud Computing. Our contribution can be summarizedas the following three aspects:1) Compared to many of its predecessors, which onlyprovide binary results about the storage status acrossthe distributed servers, the proposed scheme achievesthe integration of storage correctness insurance and dataerror localization, i.e., the identification of misbehavingserver(s).2) Unlike most prior works for ensuring remote dataintegrity, the new scheme further supports secure andefficient dynamic operations on data blocks, including:update, delete and append.3) The experiment results demonstrate the proposedscheme is highly efficient. Extensive security analysisshows our scheme is resilient against Byzantine failure,malicious data modification attack, and even server col-luding attacks.

The rest of the paper is organized as follows. SectionII introduces the system model, adversary model, ourdesign goal and notations. Then we provide the detaileddescription of our scheme in Section III and IV. Section Vgives the security analysis and performance evaluations,followed by Section VI which overviews the relatedwork. Finally, Section VII concludes the whole paper.

2 PROBLEM STATEMENT2.1 System ModelA representative network architecture for cloud storageservice architecture is illustrated in Figure 1. Three dif-ferent network entities can be identified as follows:

• User: an entity, who has data to be stored in thecloud and relies on the cloud for data storage andcomputation, can be either enterprise or individualcustomers.

• Cloud Server (CS): an entity, which is managed bycloud service provider (CSP) to provide data storageservice and has significant storage space and com-putation resources (we will not differentiate CS andCSP hereafter.).

• Third Party Auditor (TPA): an optional TPA, whohas expertise and capabilities that users may nothave, is trusted to assess and expose risk of cloudstorage services on behalf of the users upon request.

In cloud data storage, a user stores his data through aCSP into a set of cloud servers, which are running in asimultaneous, cooperated and distributed manner. Dataredundancy can be employed with technique of erasure-correcting code to further tolerate faults or server crashas user’s data grows in size and importance. Thereafter,for application purposes, the user interacts with the

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 3: Towards Secure and Dependable Storage services in cloud computing

3

cloud servers via CSP to access or retrieve his data.In some cases, the user may need to perform blocklevel operations on his data. The most general forms ofthese operations we are considering are block update,delete, insert and append. Note that in this paper, weput more focus on the support of file-oriented cloudapplications other than non-file application data, suchas social networking data. In other words, the clouddata we are considering is not expected to be rapidlychanging in a relative short period.

As users no longer possess their data locally, it is ofcritical importance to ensure users that their data arebeing correctly stored and maintained. That is, usersshould be equipped with security means so that theycan make continuous correctness assurance (to enforcecloud storage service-level agreement) of their storeddata even without the existence of local copies. In casethat users do not necessarily have the time, feasibility orresources to monitor their data online, they can delegatethe data auditing tasks to an optional trusted TPA oftheir respective choices. However, to securely introducesuch a TPA, any possible leakage of user’s outsourceddata towards TPA through the auditing protocol shouldbe prohibited.

In our model, we assume that the point-to-point com-munication channels between each cloud server and theuser is authenticated and reliable, which can be achievedin practice with little overhead. These authenticationhandshakes are omitted in the following presentation.

2.2 Adversary ModelFrom user’s perspective, the adversary model has to cap-ture all kinds of threats towards his cloud data integrity.Because cloud data do not reside at user’s local site butat CSP’s address domain, these threats can come fromtwo different sources: internal and external attacks. Forinternal attacks, a CSP can be self-interested, untrustedand possibly malicious. Not only does it desire to movedata that has not been or is rarely accessed to a lowertier of storage than agreed for monetary reasons, but itmay also attempt to hide a data loss incident due tomanagement errors, Byzantine failures and so on. Forexternal attacks, data integrity threats may come fromoutsiders who are beyond the control domain of CSP,for example, the economically motivated attackers. Theymay compromise a number of cloud data storage serversin different time intervals and subsequently be able tomodify or delete users’ data while remaining undetectedby CSP.

Therefore, we consider the adversary in our model hasthe following capabilities, which captures both externaland internal threats towards the cloud data integrity.Specifically, the adversary is interested in continuouslycorrupting the user’s data files stored on individualservers. Once a server is comprised, an adversary canpollute the original data files by modifying or introduc-ing its own fraudulent data to prevent the original data

� � � � � � � �

� � � � � � � � � � � � �

� � � � � � � � � � � � � � � � � � � � � � � ! � � � " # � ! � # � � � � � $ � � �

% & ' ( )* + & , ) - ) . /

0 1 2 3 45 6 7 8 6 7 9

: ; < = ;

> ? @ A B ? C D E FG H I = J K L = M N O P J I M Q =

Fig. 1: Cloud storage service architecture

from being retrieved by the user. This corresponds to thethreats from external attacks. In the worst case scenario,the adversary can compromise all the storage servers sothat he can intentionally modify the data files as long asthey are internally consistent. In fact, this is equivalentto internal attack case where all servers are assumedcolluding together from the early stages of applicationor service deployment to hide a data loss or corruptionincident.

2.3 Design GoalsTo ensure the security and dependability for cloud datastorage under the aforementioned adversary model, weaim to design efficient mechanisms for dynamic dataverification and operation and achieve the followinggoals: (1) Storage correctness: to ensure users that theirdata are indeed stored appropriately and kept intactall the time in the cloud. (2) Fast localization of dataerror: to effectively locate the malfunctioning serverwhen data corruption has been detected. (3) Dynamicdata support: to maintain the same level of storagecorrectness assurance even if users modify, delete orappend their data files in the cloud. (4) Dependability:to enhance data availability against Byzantine failures,malicious data modification and server colluding attacks,i.e. minimizing the effect brought by data errors or serverfailures. (5) Lightweight: to enable users to performstorage correctness checks with minimum overhead.

2.4 Notation and Preliminaries• F – the data file to be stored. We assume that F

can be denoted as a matrix of m equal-sized datavectors, each consisting of l blocks. Data blocksare all well represented as elements in Galois FieldGF (2p) for p = 8 or 16.

• A – The dispersal matrix used for Reed-Solomoncoding.

• G – The encoded file matrix, which includes a setof n = m + k vectors, each consisting of l blocks.

• fkey(·) – pseudorandom function (PRF), which isdefined as f : {0, 1}∗ × key → GF (2p).

• φkey(·) – pseudorandom permutation (PRP), whichis defined as φ : {0, 1}log2(�) × key → {0, 1}log2(�).

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 4: Towards Secure and Dependable Storage services in cloud computing

4

• ver – a version number bound with the index forindividual blocks, which records the times the blockhas been modified. Initially we assume ver is 0 forall data blocks.

• sverij – the seed for PRF, which depends on the file

name, block index i, the server position j as well asthe optional block version number ver.

3 ENSURING CLOUD DATA STORAGEIn cloud data storage system, users store their data in thecloud and no longer possess the data locally. Thus, thecorrectness and availability of the data files being storedon the distributed cloud servers must be guaranteed.One of the key issues is to effectively detect any unau-thorized data modification and corruption, possibly dueto server compromise and/or random Byzantine failures.Besides, in the distributed case when such inconsisten-cies are successfully detected, to find which server thedata error lies in is also of great significance, since it canalways be the first step to fast recover the storage errorsand/or identifying potential threats of external attacks.

To address these problems, our main scheme for en-suring cloud data storage is presented in this section.The first part of the section is devoted to a review ofbasic tools from coding theory that is needed in ourscheme for file distribution across cloud servers. Then,the homomorphic token is introduced. The token compu-tation function we are considering belongs to a family ofuniversal hash function [24], chosen to preserve the ho-momorphic properties, which can be perfectly integratedwith the verification of erasure-coded data [21] [25].Subsequently, it is shown how to derive a challenge-response protocol for verifying the storage correctness aswell as identifying misbehaving servers. The procedurefor file retrieval and error recovery based on erasure-correcting code is also outlined. Finally, we describe howto extend our scheme to third party auditing with onlyslight modification of the main design.

3.1 File Distribution PreparationIt is well known that erasure-correcting code may beused to tolerate multiple failures in distributed storagesystems. In cloud data storage, we rely on this techniqueto disperse the data file F redundantly across a set ofn = m + k distributed servers. An (m, k) Reed-Solomonerasure-correcting code is used to create k redundancyparity vectors from m data vectors in such a way that theoriginal m data vectors can be reconstructed from any mout of the m+k data and parity vectors. By placing eachof the m+k vectors on a different server, the original datafile can survive the failure of any k of the m + k serverswithout any data loss, with a space overhead of k/m.For support of efficient sequential I/O to the original file,our file layout is systematic, i.e., the unmodified m datafile vectors together with k parity vectors is distributedacross m + k different servers.

Let F = (F1, F2, . . . , Fm) and Fi = (f1i, f2i, . . . , fli)T

(i ∈ {1, . . . , m}). Here T (shorthand for transpose) de-notes that each Fi is represented as a column vector, andl denotes data vector size in blocks. All these blocks areelements of GF (2p). The systematic layout with parityvectors is achieved with the information dispersal matrixA, derived from an m×(m+k) Vandermonde matrix [26]:⎛⎜⎜⎜⎝

1 1 . . . 1 1 . . . 1β1 β2 . . . βm βm+1 . . . βn

......

. . ....

.... . .

...βm−1

1 βm−12 . . . βm−1

m βm−1m+1 . . . βm−1

n

⎞⎟⎟⎟⎠ ,

where βj (j ∈ {1, . . . , n}) are distinct elements randomlypicked from GF (2p).

After a sequence of elementary row transformations,the desired matrix A can be written as

A = (I|P) =

⎛⎜⎜⎜⎝1 0 . . . 0 p11 p12 . . . p1k

0 1 . . . 0 p21 p22 . . . p2k

......

. . ....

......

. . ....

0 0 . . . 1 pm1 pm2 . . . pmk

⎞⎟⎟⎟⎠ ,

where I is a m × m identity matrix and P is the secretparity generation matrix with size m × k. Note that A

is derived from a Vandermonde matrix, thus it has theproperty that any m out of the m + k columns form aninvertible matrix.

By multiplying F by A, the user obtains the encodedfile:

G = F · A = (G(1), G(2), . . . , G(m), G(m+1), . . . , G(n))

= (F1, F2, . . . , Fm, G(m+1), . . . , G(n)),

where G(j) = (g(j)1 , g

(j)2 , . . . , g

(j)l )T (j ∈ {1, . . . , n}).

As noticed, the multiplication reproduces the origi-nal data file vectors of F and the remaining part(G(m+1), . . . , G(n)) are k parity vectors generated basedon F.

3.2 Challenge Token Pre-computationIn order to achieve assurance of data storage correctnessand data error localization simultaneously, our schemeentirely relies on the pre-computed verification tokens.The main idea is as follows: before file distribution theuser pre-computes a certain number of short verificationtokens on individual vector G(j) (j ∈ {1, . . . , n}), eachtoken covering a random subset of data blocks. Later,when the user wants to make sure the storage correct-ness for the data in the cloud, he challenges the cloudservers with a set of randomly generated block indices.Upon receiving challenge, each cloud server computes ashort “signature” over the specified blocks and returnsthem to the user. The values of these signatures shouldmatch the corresponding tokens pre-computed by theuser. Meanwhile, as all servers operate over the samesubset of the indices, the requested response values forintegrity check must also be a valid codeword deter-mined by secret matrix P.

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 5: Towards Secure and Dependable Storage services in cloud computing

5

Algorithm 1 Token Pre-computation1: procedure2: Choose parameters l, n and function f, φ;3: Choose the number t of tokens;4: Choose the number r of indices per verification;5: Generate master key KPRP and challenge key

kchal;6: for vector G(j), j ← 1, n do7: for round i← 1, t do8: Derive αi = fkchal

(i) and k(i)prp from KPRP .

9: Compute v(j)i =

∑r

q=1 αqi ∗ G(j)[φ

k(i)prp

(q)]

10: end for11: end for12: Store all the vi’s locally.13: end procedure

Suppose the user wants to challenge the cloud serverst times to ensure the correctness of data storage. Then, hemust pre-compute t verification tokens for each G(j) (j ∈{1, . . . , n}), using a PRF f(·), a PRP φ(·), a challenge keykchal and a master permutation key KPRP . Specifically,to generate the ith token for server j, the user acts asfollows:

1) Derive a random challenge value αi of GF (2p) byαi = fkchal

(i) and a permutation key k(i)prp based on

KPRP .2) Compute the set of r randomly-chosen indices:

{Iq ∈ [1, ..., l]|1 ≤ q ≤ r}, where Iq = φk(i)prp

(q).

3) Calculate the token as:

v(j)i =

r∑q=1

αqi ∗ G(j)[Iq], where G(j)[Iq ] = g

(j)Iq

.

Note that v(j)i , which is an element of GF (2p) with small

size, is the response the user expects to receive fromserver j when he challenges it on the specified datablocks.

After token generation, the user has the choice of ei-ther keeping the pre-computed tokens locally or storingthem in encrypted form on the cloud servers. In our casehere, the user stores them locally to obviate the need forencryption and lower the bandwidth overhead duringdynamic data operation which will be discussed shortly.The details of token generation are shown in Algorithm1.

Once all tokens are computed, the final step beforefile distribution is to blind each parity block g

(j)i in

(G(m+1), . . . , G(n)) by

g(j)i ← g

(j)i + fkj

(sij), i ∈ {1, . . . , l},

where kj is the secret key for parity vector G(j) (j ∈{m + 1, . . . , n}). This is for protection of the secret ma-trix P. We will discuss the necessity of using blindedparities in detail in Section 5.2. After blinding the par-ity information, the user disperses all the n encoded

vectors G(j) (j ∈ {1, . . . , n}) across the cloud serversS1, S2, . . . , Sn.

3.3 Correctness Verification and Error LocalizationError localization is a key prerequisite for eliminatingerrors in storage systems. It is also of critical importanceto identify potential threats from external attacks. How-ever, many previous schemes [20], [21] do not explicitlyconsider the problem of data error localization, thus onlyproviding binary results for the storage verification. Ourscheme outperforms those by integrating the correctnessverification and error localization (misbehaving serveridentification) in our challenge-response protocol: theresponse values from servers for each challenge not onlydetermine the correctness of the distributed storage, butalso contain information to locate potential data error(s).

Specifically, the procedure of the i-th challenge-response for a cross-check over the n servers is describedas follows:

1) The user reveals the αi as well as the i-th permu-tation key k

(i)prp to each servers.

2) The server storing vector G(j) (j ∈ {1, . . . , n})

aggregates those r rows specified by index k(i)prp into

a linear combination

R(j)i =

r∑q=1

αqi ∗ G(j)[φ

k(i)prp

(q)].

and send back R(j)i (j ∈ {1, . . . , n}).

3) Upon receiving R(j)i ’s from all the servers, the user

takes away blind values in R(j) (j ∈ {m+1, . . . , n})by

R(j)i ← R

(j)i −

r∑q=1

fkj(sIq,j) ·α

qi , where Iq = φ

k(i)prp

(q).

4) Then the user verifies whether the received valuesremain a valid codeword determined by secretmatrix P:

(R(1)i , . . . , R

(m)i ) ·P

?= (R

(m+1)i , . . . , R

(n)i ).

Because all the servers operate over the same subset ofindices, the linear aggregation of these r specified rows(R

(1)i , . . . , R

(n)i ) has to be a codeword in the encoded

file matrix (See Section 5.1 for the correctness analysis).If the above equation holds, the challenge is passed.Otherwise, it indicates that among those specified rows,there exist file block corruptions.

Once the inconsistency among the storage has beensuccessfully detected, we can rely on the pre-computedverification tokens to further determine where the po-tential data error(s) lies in. Note that each response R

(j)i

is computed exactly in the same way as token v(j)i , thus

the user can simply find which server is misbehaving byverifying the following n equations:

R(j)i

?= v

(j)i , j ∈ {1, . . . , n}.

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 6: Towards Secure and Dependable Storage services in cloud computing

6

Algorithm 2 Correctness Verification and Error Localiza-tion

1: procedure CHALLENGE(i)2: Recompute αi = fkchal

(i) and k(i)prp from KPRP ;

3: Send {αi, k(i)prp} to all the cloud servers;

4: Receive from servers:{R

(j)i =

∑r

q=1 αqi ∗ G(j)[φ

k(i)prp

(q)]|1 ≤ j ≤ n}

5: for (j ← m + 1, n) do6: R(j) ← R(j)−

∑r

q=1 fkj(sIq,j)·α

qi , Iq = φ

k(i)prp

(q)

7: end for8: if ((R

(1)i , . . . , R

(m)i )·P==(R

(m+1)i , . . . , R

(n)i )) then

9: Accept and ready for the next challenge.10: else11: for (j ← 1, n) do12: if (R

(j)i ! =v

(j)i ) then

13: return server j is misbehaving.14: end if15: end for16: end if17: end procedure

Algorithm 2 gives the details of correctness verificationand error localization.Discussion. Previous work [20], [21] has suggested usingthe decoding capability of error-correction code to treatdata errors. But such approach imposes a bound onthe number of misbehaving servers b by b ≤ �k/2�.Namely, they cannot identify misbehaving servers whenb > �k/2�1. However, our token based approach, whileallowing efficient storage correctness validation, doesnot have this limitation on the number of misbehavingservers. That is, our approach can identify any numberof misbehaving servers for b ≤ (m+k). Also note that, forevery challenge, each server only needs to send back anaggregated value over the specified set of blocks. Thusthe bandwidth cost of our approach is much less than thestraightforward approaches that require downloading allthe challenged data.

3.4 File Retrieval and Error RecoverySince our layout of file matrix is systematic, the user canreconstruct the original file by downloading the data vec-tors from the first m servers, assuming that they returnthe correct response values. Notice that our verificationscheme is based on random spot-checking, so the storagecorrectness assurance is a probabilistic one. However,by choosing system parameters (e.g., r, l, t) appropriatelyand conducting enough times of verification, we canguarantee the successful file retrieval with high proba-bility. On the other hand, whenever the data corruptionis detected, the comparison of pre-computed tokens andreceived response values can guarantee the identification

1. In [20], the authors also suggest using brute-force decoding whentheir dispersal code is an erasure code. However, such brute-forcemethod is asymptotically inefficient, and still cannot guarantee identi-fication of all misbehaving servers.

Algorithm 3 Error Recovery1: procedure

% Assume the block corruptions have been detectedamong% the specified r rows;% Assume s ≤ k servers have been identified misbe-having

2: Download r rows of blocks from servers;3: Treat s servers as erasures and recover the blocks.4: Resend the recovered blocks to corresponding

servers.5: end procedure

of misbehaving server(s) (again with high probability),which will be discussed shortly. Therefore, the user canalways ask servers to send back blocks of the r rowsspecified in the challenge and regenerate the correctblocks by erasure correction, shown in Algorithm 3, aslong as the number of identified misbehaving serversis less than k. (otherwise, there is no way to recoverthe corrupted blocks due to lack of redundancy, evenif we know the position of misbehaving servers.) Thenewly recovered blocks can then be redistributed tothe misbehaving servers to maintain the correctness ofstorage.

3.5 Towards Third Party AuditingAs discussed in our architecture, in case the user doesnot have the time, feasibility or resources to performthe storage correctness verification, he can optionallydelegate this task to an independent third party auditor,making the cloud storage publicly verifiable. However,as pointed out by the recent work [27], [28], to securelyintroduce an effective TPA, the auditing process shouldbring in no new vulnerabilities towards user data pri-vacy. Namely, TPA should not learn user’s data contentthrough the delegated data auditing. Now we show thatwith only slight modification, our protocol can supportprivacy-preserving third party auditing.

The new design is based on the observation of linearproperty of the parity vector blinding process. Recall thatthe reason of blinding process is for protection of thesecret matrix P against cloud servers. However, this canbe achieved either by blinding the parity vector or byblinding the data vector (we assume k < m). Thus, ifwe blind data vector before file distribution encoding,then the storage verification task can be successfullydelegated to third party auditing in a privacy-preservingmanner. Specifically, the new protocol is described asfollows:

1) Before file distribution, the user blinds each fileblock data g

(j)i in (G(1), . . . , G(m)) by g

(j)i ← g

(j)i +

fkj(sij), i ∈ {1, . . . , l}, where kj is the secret key for

data vector G(j) (j ∈ {1, . . . , m}).2) Based on the blinded data vector (G(1), . . . , G(m)),

the user generates k parity vectors

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 7: Towards Secure and Dependable Storage services in cloud computing

7

(G(m+1), . . . , G(n)) via the secret matrix P.3) The user calculates the ith token for server

j as previous scheme: v(j)i =

∑r

q=1 αqi ∗

G(j)[Iq], where G(j)[Iq ] = g(j)Iq

and αi = fkchal(i) ∈

GF (2p).

4) The user sends the token set {v(j)i }{1≤i≤t,1≤j≤n},

secret matrix P, permutation and challenge keyKPRP and kchal to TPA for auditing delegation.

The correctness validation and misbehaving serveridentification for TPA is just similar to the previousscheme. The only difference is that TPA does not haveto take away the blinding values in the servers’ responseR(j) (j ∈ {1, . . . , n}) but verifies directly. As TPA doesnot know the secret blinding key kj (j ∈ {1, . . . , m}),there is no way for TPA to learn the data content infor-mation during auditing process. Therefore, the privacy-preserving third party auditing is achieved. Note thatcompared to previous scheme, we only change the se-quence of file encoding, token pre-computation, andblinding. Thus, the overall computation overhead andcommunication overhead remains roughly the same.

4 PROVIDING DYNAMIC DATA OPERATIONSUPPORTSo far, we assumed that F represents static or archiveddata. This model may fit some application scenarios,such as libraries and scientific datasets. However, incloud data storage, there are many potential scenarioswhere data stored in the cloud is dynamic, like electronicdocuments, photos, or log files etc. Therefore, it is crucialto consider the dynamic case, where a user may wish toperform various block-level operations of update, deleteand append to modify the data file while maintainingthe storage correctness assurance.

Since data do not reside at users’ local site but atcloud service provider’s address domain, supportingdynamic data operation can be quite challenging. Onthe one hand, CSP needs to process the data dynamicsrequest without knowing the secret keying material. Onthe other hand, users need to ensure that all the dynamicdata operation request has been faithfully processed byCSP. To address this problem, we briefly explain ourapproach methodology here and provide the detailslater. For any data dynamic operation, the user mustfirst generate the corresponding resulted file blocks andparities. This part of operation has to be carried outby the user, since only he knows the secret matrix P.Besides, to ensure the changes of data blocks correctlyreflected in the cloud address domain, the user alsoneeds to modify the corresponding storage verificationtokens to accommodate the changes on data blocks. Onlywith the accordingly changed storage verification tokens,the previously discussed challenge-response protocolcan be carried on successfully even after data dynamics.In other words, these verification tokens help ensurethat CSP would correctly execute the processing of anydynamic data operation request. Otherwise, CSP would

be caught cheating with high probability in the protocolexecution later on. Given this design methodology, thestraightforward and trivial way to support these oper-ations is for user to download all the data from thecloud servers and re-compute the whole parity blocksas well as verification tokens. This would clearly behighly inefficient. In this section, we will show howour scheme can explicitly and efficiently handle dynamicdata operations for cloud data storage, by utilizing thelinear property of Reed-Solomon code and verificationtoken construction.

4.1 Update OperationIn cloud data storage, a user may need to modify somedata block(s) stored in the cloud, from its current valuefij to a new one, fij + Δfij . We refer this operationas data update. Figure 2 gives the high level logicalrepresentation of data block update. Due to the linearproperty of Reed-Solomon code, a user can performthe update operation and generate the updated parityblocks by using Δfij only, without involving any otherunchanged blocks. Specifically, the user can construct ageneral update matrix ΔF as

ΔF =

⎛⎜⎜⎜⎝Δf11 Δf12 . . . Δf1m

Δf21 Δf22 . . . Δf2m

......

. . ....

Δfl1 Δfl2 . . . Δflm

⎞⎟⎟⎟⎠= (ΔF1, ΔF2, . . . ,ΔFm).

Note that we use zero elements in ΔF to denote theunchanged blocks and thus ΔF should only be a sparsematrix most of the time (we assume for certain timeepoch, the user only updates a relatively small part offile F). To maintain the corresponding parity vectors aswell as be consistent with the original file layout, theuser can multiply ΔF by A and thus generate the updateinformation for both the data vectors and parity vectorsas follows:

ΔF · A = (ΔG(1), . . . ,ΔG(m), ΔG(m+1), . . . ,ΔG(n))

= (ΔF1, . . . ,ΔFm, ΔG(m+1), . . . ,ΔG(n)),

where ΔG(j) (j ∈ {m + 1, . . . , n}) denotes the updateinformation for the parity vector G(j).

Because the data update operation inevitably affectssome or all of the remaining verification tokens, af-ter preparation of update information, the user hasto amend those unused tokens for each vector G(j)

to maintain the same storage correctness assurance. Inother words, for all the unused tokens, the user needsto exclude every occurrence of the old data block andreplace it with the new one. Thanks to the homomorphicconstruction of our verification token, the user can per-form the token update efficiently. To give more details,suppose a block G(j)[Is], which is covered by the specifictoken v

(j)i , has been changed to G(j)[Is] + ΔG(j)[Is],

where Is = φk(i)prp

(s). To maintain the usability of token

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 8: Towards Secure and Dependable Storage services in cloud computing

8

� � � � � � � � � � � �� � � � � � � � �

� � �� � �� � �� � �� � �

� � �� � �� � �� � �� � �

� � �� � �� � �� � �� � �

� � �� � �

� � �� � �� � �

� � � � � �� � �� � �

� � �

� � �� � �� � �

� � � � � �

� � � � � � � � � � � � � � � � � � � � � � � � � � � �

� � � � � �

�� � � �

���

� � � ����

� � � �

���

� � � ��

� � � � � � � � �� � � � � �

� � � � � � � � � � � � �

Fig. 2: Logical representation of data dynamics, including block update, append and delete.

v(j)i , it is not hard to verify that the user can simply up-

date it by v(j)i ← v

(j)i + αs

i ∗ΔG(j)[Is], without retrievingother r−1 blocks required in the pre-computation of v

(j)i .

After the amendment to the affected tokens2, the userneeds to blind the update information Δg

(j)i for each

parity block in (ΔG(m+1), . . . ,ΔG(n)) to hide the secretmatrix P by Δg

(j)i ← Δg

(j)i +fkj

(sverij ), i ∈ {1, . . . , l}. Here

we use a new seed sverij for the PRF. The version number

ver functions like a counter which helps the user tokeep track of the blind information on the specific parityblocks. After blinding, the user sends update informationto the cloud servers, which perform the update operationas G(j) ← G(j) + ΔG(j), (j ∈ {1, . . . , n}).

Discussion. Note that by using the new seed sverij for

the PRF functions every time (for a block update op-eration), we can ensure the freshness of the randomvalue embedded into parity blocks. In other words, thecloud servers cannot simply abstract away the randomblinding information on parity blocks by subtracting theold and newly updated parity blocks. As a result, thesecret matrix P is still being well protected, and theguarantee of storage correctness remains.

4.2 Delete Operation

Sometimes, after being stored in the cloud, certain datablocks may need to be deleted. The delete operation weare considering is a general one, in which user replacesthe data block with zero or some special reserved datasymbol. From this point of view, the delete operationis actually a special case of the data update operation,where the original data blocks can be replaced withzeros or some predetermined special blocks. Therefore,we can rely on the update procedure to support deleteoperation, i.e., by setting Δfij in ΔF to be −Δfij . Also,all the affected tokens have to be modified and theupdated parity information has to be blinded using thesame method specified in update operation.

2. In practice, it is possible that only a fraction of tokens needamendment, since the updated blocks may not be covered by all thetokens.

4.3 Append OperationIn some cases, the user may want to increase the sizeof his stored data by adding blocks at the end of thedata file, which we refer as data append. We anticipatethat the most frequent append operation in cloud datastorage is bulk append, in which the user needs toupload a large number of blocks (not a single block) atone time.

Given the file matrix F illustrated in file distributionpreparation, appending blocks towards the end of a datafile is equivalent to concatenate corresponding rows atthe bottom of the matrix layout for file F (See Figure2). In the beginning, there are only l rows in the filematrix. To simplify the presentation, we suppose the userwants to append m blocks at the end of file F, denotedas (fl+1,1, fl+1,2, ..., fl+1,m) (We can always use zero-padding to make a row of m elements.). With the secretmatrix P, the user can directly calculate the appendblocks for each parity server as (fl+1,1, ..., fl+1,m) · P =

(g(m+1)l+1 , ..., g

(n)l+1).

To ensure the newly appended blocks are covered byour challenge tokens, we need a slight modification toour token pre-computation. Specifically, we require theuser to expect the maximum size in blocks, denoted aslmax, for each of his data vector. This idea of supportingblock append was first suggested by [13] in a singleserver setting, and it relies on both the initial budgetfor the maximum anticipated data size lmax in eachencoded data vector and the system parameter rmax =r ∗ (lmax/l) for each pre-computed challenge-responsetoken. The pre-computation of the i-th token on serverj is modified as follows: vi =

∑rmax

q=1 αqi ∗G(j)[Iq], where

G(j)[Iq] =

{G(j)[φ

k(i)prp

(q)] , [φk(i)prp

(q)] ≤ l

0 , [φk(i)prp

(q)] > l,

and the PRP φ(·) is modified as: φ(·) : {0, 1}log2(lmax) ×key → {0, 1}log2(lmax). This formula guarantees that onaverage, there will be r indices falling into the range ofexisting l blocks. Because the cloud servers and the userhave the agreement on the number of existing blocks ineach vector G(j), servers will follow exactly the aboveprocedure when re-computing the token values uponreceiving user’s challenge request.

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 9: Towards Secure and Dependable Storage services in cloud computing

9

Now when the user is ready to append new blocks,i.e., both the file blocks and the corresponding parityblocks are generated, the total length of each vectorG(j) will be increased and fall into the range [l, lmax].Therefore, the user will update those affected tokens byadding αs

i ∗ G(j)[Is] to the old vi whenever G(j)[Is] �= 0for Is > l, where Is = φ

k(i)prp

(s). The parity blinding issimilar as introduced in update operation, and thus isomitted here.

4.4 Insert OperationAn insert operation to the data file refers to an appendoperation at the desired index position while maintain-ing the same data block structure for the whole datafile, i.e., inserting a block F [j] corresponds to shiftingall blocks starting with index j + 1 by one slot. Thus,an insert operation may affect many rows in the logicaldata file matrix F, and a substantial number of computa-tions are required to renumber all the subsequent blocksas well as re-compute the challenge-response tokens.Hence, a direct insert operation is difficult to support.

In order to fully support block insertion operation,recent work [14], [15] suggests utilizing additional datastructure (for example, Merkle Hash Tree [29]) to main-tain and enforce the block index information. Followingthis line of research, we can circumvent the dilemmaof our block insertion by viewing each insertion as alogical append operation at the end of file F. Specifically,if we also use additional data structure to maintain suchlogical to physical block index mapping information,then all block insertion can be treated via logical appendoperation, which can be efficiently supported. On theother hand, using the block index mapping information,the user can still access or retrieve the file as it is. Notethat as a tradeoff, the extra data structure informationhas to be maintained locally on the user side.

5 SECURITY ANALYSIS AND PERFORMANCEEVALUATIONIn this section, we analyze our proposed scheme interms of correctness, security and efficiency. Our securityanalysis focuses on the adversary model defined inSection II. We also evaluate the efficiency of our schemevia implementation of both file distribution preparationand verification token pre-computation.

5.1 Correctness AnalysisFirst, we analyze the correctness of the verificationprocedure. Upon obtaining all the response R

(j)i s from

servers and taking away the random blind values fromR

(j)i (j ∈ {m + 1, . . . , n}), the user relies on the equation

(R(1)i , . . . , R

(m)i ) · P

?= (R

(m+1)i , . . . , R

(n)i ) to ensure the

storage correctness. To see why this is true, we canrewrite the equation according to the token computation:(∑r

q=1 αqi ∗ g

(1)Iq

, ...,∑r

q=1 αqi ∗ g

(m)Iq

) · P = (∑r

q=1 αqi ∗

g(m+1)Iq

, ...,∑r

q=1 αqi ∗ g

(n)Iq

), and hence the left hand side(LHS) of the equation expands as:

LHS = (αi, α2i , . . . , α

ri )

⎛⎜⎜⎜⎜⎝g(1)I1

g(2)I1

. . . g(m)I1

g(1)I2

g(2)I2

. . . g(m)I2

......

. . ....

g(1)Ir

g(2)Ir

. . . g(m)Ir

⎞⎟⎟⎟⎟⎠ ·P

= (αi, α2i , . . . , α

ri )

⎛⎜⎜⎜⎜⎝g(m+1)I1

g(m+2)I1

. . . g(n)I1

g(m+1)I2

g(m+2)I2

. . . g(n)I2

......

. . ....

g(m+1)Ir

g(m+2)Ir

. . . g(n)Ir

⎞⎟⎟⎟⎟⎠ ,

which equals the right hand side as required. Thus, itis clear to show that as long as each server operates onthe same specified subset of rows, the above checkingequation will always hold.

5.2 Security Strength5.2.1 Detection Probability against Data ModificationIn our scheme, servers are required to operate onlyon specified rows in each challenge-response protocolexecution. We will show that this “sampling” strategyon selected rows instead of all can greatly reduce thecomputational overhead on the server, while maintain-ing high detection probability for data corruption.

Suppose nc servers are misbehaving due to the pos-sible compromise or Byzantine failure. In the followinganalysis, we do not limit the value of nc, i.e., nc ≤ n.Thus, all the analysis results hold even if all the serversare compromised. We will leave the explanation oncollusion resistance of our scheme against this worstcase scenario in a later subsection. Assume the adversarymodifies the data blocks in z rows out of the l rows inthe encoded file matrix. Let r be the number of differentrows for which the user asks for checking in a challenge.Let X be a discrete random variable that is defined tobe the number of rows chosen by the user that matchesthe rows modified by the adversary. We first analyzethe matching probability that at least one of the rowspicked by the user matches one of the rows modifiedby the adversary: P r

m = 1 − P{X = 0} = 1 −∏r−1

i=0 (1 −min{ z

l−i, 1}) ≥ 1− ( l−z

l)r. If none of the specified r rows

in the i-th verification process are deleted or modified,the adversary avoids the detection.

Next, we study the probability of a false negativeresult that there exists at least one invalid response calcu-lated from those specified r rows, but the checking equa-tion still holds. Consider the responses R

(1)i , . . . , R

(n)i

returned from the data storage servers for the i-thchallenge, each response value R

(j)i , calculated within

GF (2p), is based on r blocks on server j. The num-ber of responses R(m+1), . . . , R(n) from parity serversis k = n − m. Thus, according to the proposition 2 ofour previous work in [30], the false negative probability

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 10: Towards Secure and Dependable Storage services in cloud computing

10

0.10.1

0.1

0.10.1 0.10.1

0.2

0.2

0.2

0.20.2

0.3

0.3

0.3

0.30.3

0.4

0.4

0.4

0.4

0.4

0.5

0.5

0.5

0.5

0.5

0.6

0.6

0.6

0.6

0.6

0.7

0.7

0.7

0.7

0.7

0.8

0.8

0.8

0.8

0.9

0.9

0.9

0.95

0.95

0.95

0.99

r (number of queried rows) (as a percentage of l)

l (to

tal n

umbe

r of r

ows)

0 1 2 3 4 5 6 7 8 9 100

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

(a) z = 1% of l.

0.1

0.1

0.1

0.10.1 0.1

0.2

0.2

0.2

0.20.2 0.2

0.3

0.3

0.3

0.30.3 0.3

0.4

0.4

0.4

0.4

0.40.4

0.5

0.5

0.5

0.50.5

0.6

0.6

0.6

0.60.6

0.7

0.7

0.7

0.7

0.7

0.8

0.8

0.8

0.8

0.8

0.9

0.9

0.9

0.9

0.9

0.95

0.95

0.95

0.950.99

0.99

0.99

r (number of queried rows) (as a percentage of l)

l (to

tal n

umbe

r of r

ows)

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

(b) z = 10% of l.

Fig. 3: The detection probability Pd against data modification. We show Pd as a function of l (the number of blockson each cloud storage server) and r (the number of rows queried by the user, shown as a percentage of l) for twovalues of z (the number of rows modified by the adversary). Both graphs are plotted under p = 16, nc = 10 andk = 5, but with different scale.

is P rf = Pr1 + Pr2, where Pr1 = (1+2−p)nc−1

2nc−1 andPr2 = (1 − Pr1)(2

−p)k.Based on above discussion, it follows that the prob-

ability of data modification detection across all storageservers is Pd = P r

m·(1−P rf ). Figure 3 plots Pd for different

values of l, r, z while we set p = 16, nc = 10 and k = 5 3.From the figure we can see that if more than a fractionof the data file is corrupted, then it suffices to challengefor a small constant number of rows in order to achievedetection with high probability. For example, if z = 1%of l, every token only needs to cover 460 indices in orderto achieve the detection probability of at least 99%.

5.2.2 Identification Probability for Misbehaving ServersWe have shown that, if the adversary modifies thedata blocks among any of the data storage servers,our sampling checking scheme can successfully detectthe attack with high probability. As long as the datamodification is caught, the user will further determinewhich server is malfunctioning. This can be achieved bycomparing the response values R

(j)i with the pre-stored

tokens v(j)i , where j ∈ {1, . . . , n}. The probability for

error localization or identifying misbehaving server(s)can be computed in a similar way. It is the productof the matching probability for sampling check andthe probability of complementary event for the falsenegative result. Obviously, the matching probability isP r

m = 1 −∏r−1

i=0 (1 − min{ zl−i

, 1}), where z ≤ z.Next, we consider the false negative probability that

R(j)i = v

(j)i when at least one of z blocks is modified.

According to proposition 1 of [30], tokens calculated

3. Note that nc and k only affect the false negative probabilityP r

f. However in our scheme, since p = 16 almost dominates the

negligibility of P rf

, the value of nc and k have little effect in the plotof Pd.

in GF (2p) for two different data vectors collide withprobability P r

f = 2−p. Thus, the identification probabilityfor misbehaving server(s) is Pd = P r

m · (1 − P rf ). Along

with the analysis in detection probability, if z = 1% ofl and each token covers 460 indices, the identificationprobability for misbehaving servers is at least 99%. Notethat if the number of detected misbehaving servers is lessthan the parity vectors, we can use erasure-correctingcode to recover the corrupted data, achieving storagedependability as shown at Section 3.4 and Algorithm 3.

5.2.3 Security Strength Against Worst Case ScenarioWe now explain why it is a must to blind the parityblocks and how our proposed schemes achieve collusionresistance against the worst case scenario in the adver-sary model.

Recall that in the file distribution preparation, theredundancy parity vectors are calculated via multiplyingthe file matrix F by P, where P is the secret paritygeneration matrix we later rely on for storage correctnessassurance. If we disperse all the generated vectors di-rectly after token pre-computation, i.e., without blinding,malicious servers that collaborate can reconstruct the se-cret P matrix easily: they can pick blocks from the samerows among the data and parity vectors to establish aset of m · k linear equations and solve for the m · kentries of the parity generation matrix P. Once they havethe knowledge of P, those malicious servers can conse-quently modify any part of the data blocks and calculatethe corresponding parity blocks, and vice versa, makingtheir codeword relationship always consistent. There-fore, our storage correctness challenge scheme would beundermined—even if those modified blocks are coveredby the specified rows, the storage correctness checkequation would always hold.

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 11: Towards Secure and Dependable Storage services in cloud computing

11

�� �

� � �� � �� � �� � �� � �� � �� � �

� � � � � � � � � � � � � � � � � � � �

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� �� �� � � �! "

# $%& %��"

'(

(a) m is fixed, and k is decreasing.

)* )

+ ) )+ * ), ) ), * )- ) )- * ). ) )

+ / 0 . + 1 0 1 + . 0 / + , 0 + ) + ) 0 + ,

2 3 4 5 6 7 8 9 5 : ; 5 : < = > ? 62 3 4 5 6 7 < @ : @ 4 3 A > : = > ? 6

B CD EF GCHDI J

K LMN HMGCJ

OP

(b) m + k is fixed.

Fig. 4: Performance comparison between two different parameter settings for 1 GB file distribution preparation.The (m, k) denotes the chosen parameters for the underlying Reed-Solomon coding. For example, (10,2) means wedivide file into 10 data vectors and then generate 2 redundant parity vectors.

To prevent colluding servers from recovering P andmaking up consistently-related data and parity blocks,we utilize the technique of adding random perturbationsto the encoded file matrix and hence hide the secretmatrix P. We make use of a keyed pseudorandom func-tion fkj

(·) with key kj and seed sverij , both of which

has been introduced previously. In order to maintain thesystematic layout of data file, we only blind the parityblocks with random perturbations (We can also onlyblind data blocks and achieve privacy-preserving thirdparty auditing, as shown in Section 3.5). Our purpose isto add “noise” to the set of linear equations and make itcomputationally infeasible to solve for the correct secretmatrix P. By blinding each parity block with randomperturbation, the malicious servers no longer have allthe necessary information to build up the correct linearequation groups and therefore cannot derive the secretmatrix P.

5.3 Performance EvaluationWe now assess the performance of the proposed storageauditing scheme. We focus on the cost of file distribu-tion preparation as well as the token generation. Ourexperiment is conducted on a system with an Intel Core2 processor running at 1.86 GHz, 2048 MB of RAM, anda 7200 RPM Western Digital 250 GB Serial ATA drive.Algorithms are implemented using open-source erasurecoding library Jerasure [31] written in C. All resultsrepresent the mean of 20 trials.

5.3.1 File Distribution PreparationAs discussed, file distribution preparation includes thegeneration of parity vectors (the encoding part) as wellas the corresponding parity blinding part. We considertwo sets of different parameters for the (m, k) Reed-Solomon encoding, both of which work over GF (216).Figure 4 shows the total cost for preparing a 1 GB filebefore outsourcing. In the figure on the left, we setthe number of data vectors m constant at 10, whiledecreasing the number of parity vectors k from 10 to2. In the one on the right, we keep the total number of

data and parity vectors m + k fixed at 22, and changethe number of data vectors m from 18 to 10. From thefigure, we can see the number k is the dominant factorfor the cost of both parity generation and parity blinding.This can be explained as follows: on the one hand,k determines how many parity vectors are requiredbefore data outsourcing, and the parity generation costincreases almost linearly with the growth of k; on theother hand, the growth of k means the larger numberof parity blocks required to be blinded, which directlyleads to more calls to our non-optimized PRF generationin C. By using more practical PRF constructions, such asHMAC [32], the parity blinding cost is expected to befurther improved.

Compared to the existing work [20], it can be shownfrom Figure 4 that the file distribution preparation ofour scheme is more efficient. This is because in [20] anadditional layer of error-correcting code has to be con-ducted on all the data and parity vectors right after thefile distribution encoding. For the same reason, the two-layer coding structure makes the solution in [20] moresuitable for static data only, as any change to the contentsof file F must propagate through the two-layer error-correcting code, which entails both high communicationand computation complexity. But in our scheme, the fileupdate only affects the specific “rows” of the encodedfile matrix, striking a good balance between both errorresilience and data dynamics.

5.3.2 Challenge Token Computation

Although in our scheme the number of verification tokent is a fixed priori determined before file distribution,we can overcome this issue by choosing sufficient larget in practice. For example, when t is selected to be7300 and 14600, the data file can be verified every dayfor the next 20 years and 40 years, respectively, whichshould be of enough use in practice. Note that insteadof directly computing each token, our implementationuses the Horner algorithm suggested in [21] to calculatetoken v

(j)i from the back, and achieves a slightly faster

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 12: Towards Secure and Dependable Storage services in cloud computing

12

Verify daily for next 20 years (m, k) = (10, 4) (m, k) = (10, 6) (m, k) = (10, 8) (m, k) = (14, 8)

Storage overhead (KB) 199.61 228.13 256.64 313.67Computation overhead (Second) 41.40 47.31 53.22 65.05

Verify daily for next 40 years (m, k) = (10, 4) (m, k) = (10, 6) (m, k) = (10, 8) (m, k) = (14, 8)

Storage overhead (KB) 399.22 456.25 513.28 627.34Computation overhead (Second) 82.79 94.62 106.45 130.10

TABLE 1: The storage and computation cost of token pre-computation for 1GB data file under different systemsettings. The (m, k) denotes the parameters for the underlying Reed-Solomon coding, as illustrated in Fig. 4.

performance. Specifically,

v(j)i =

r∑q=1

αr+1−qi ∗ G(j)[Iq] = (((G(j)[I1] ∗ αi +

G(j)[I2]) ∗ αi + G(j)[I3] . . .) ∗ αi + G(j)[Ir]) ∗ αi,

which only requires r multiplication and (r − 1) XORoperations. With Jerasure library [31], the multiplicationover GF (216) in our experiment is based on discretelogarithms.

Following the security analysis, we select a practicalparameter r = 460 for our token pre-computation (seeSection 5.2.1), i.e., each token covers 460 different indices.Other parameters are along with the file distributionpreparation. Our implementation shows that the averagetoken pre-computation cost is about 0.4 ms. This is sig-nificantly faster than the hash function based token pre-computation scheme proposed in [13]. To verify encodeddata distributed over a typical number of 14 servers, thetotal cost for token pre-computation is no more than1 and 1.5 minutes, for the next 20 years and 40 years,respectively. Note that each token is only an element offield GF (216), the extra storage for those pre-computedtokens is less than 1MB, and thus can be neglected. Table1 gives a summary of storage and computation cost oftoken pre-computation for 1GB data file under differentsystem settings.

6 RELATED WORKJuels et al. [9] described a formal “proof of retriev-ability” (POR) model for ensuring the remote data in-tegrity. Their scheme combines spot-checking and error-correcting code to ensure both possession and retriev-ability of files on archive service systems. Shacham etal. [16] built on this model and constructed a randomlinear function based homomorphic authenticator whichenables unlimited number of challenges and requiresless communication overhead due to its usage of rel-atively small size of BLS signature. Bowers et al. [17]proposed an improved framework for POR protocolsthat generalizes both Juels and Shacham’s work. Laterin their subsequent work, Bowers et al. [20] extendedPOR model to distributed systems. However, all theseschemes are focusing on static data. The effectivenessof their schemes rests primarily on the preprocessingsteps that the user conducts before outsourcing the datafile F. Any change to the contents of F, even fewbits, must propagate through the error-correcting code

and the corresponding random shuffling process, thusintroducing significant computation and communicationcomplexity. Recently, Dodis et al. [19] gave theoreticalstudies on generalized framework for different variantsof existing POR work.

Ateniese et al. [10] defined the “provable data pos-session” (PDP) model for ensuring possession of fileon untrusted storages. Their scheme utilized public keybased homomorphic tags for auditing the data file. How-ever, the pre-computation of the tags imposes heavycomputation overhead that can be expensive for anentire file. In their subsequent work, Ateniese et al. [13]described a PDP scheme that uses only symmetric keybased cryptography. This method has lower-overheadthan their previous scheme and allows for block updates,deletions and appends to the stored file, which has alsobeen supported in our work. However, their schemefocuses on single server scenario and does not providedata availability guarantee against server failures, leav-ing both the distributed scenario and data error recoveryissue unexplored. The explicit support of data dynamicshas further been studied in the two recent work [14]and [15]. Wang et al. [14] proposed to combine BLSbased homomorphic authenticator with Merkle HashTree to support fully data dynamics, while Erway etal. [15] developed a skip list based scheme to enableprovable data possession with fully dynamics support.The incremental cryptography work done by Bellare etal. [33] also provides a set of cryptographic buildingblocks such as hash, MAC, and signature functions thatmay be employed for storage integrity verification whilesupporting dynamic operations on data. However, thisbranch of work falls into the traditional data integrityprotection mechanism, where local copy of data has tobe maintained for the verification. It is not yet clear howthe work can be adapted to cloud storage scenario whereusers no longer have the data at local sites but still needto ensure the storage correctness efficiently in the cloud.

In other related work, Curtmola et al. [18] aimed to en-sure data possession of multiple replicas across the dis-tributed storage system. They extended the PDP schemeto cover multiple replicas without encoding each replicaseparately, providing guarantee that multiple copies ofdata are actually maintained. Lillibridge et al. [22] pre-sented a P2P backup scheme in which blocks of a datafile are dispersed across m + k peers using an (m, k)-erasure code. Peers can request random blocks from theirbackup peers and verify the integrity using separate

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 13: Towards Secure and Dependable Storage services in cloud computing

13

keyed cryptographic hashes attached on each block.Their scheme can detect data loss from free-riding peers,but does not ensure all data is unchanged. Filho et al. [34]proposed to verify data integrity using RSA-based hashto demonstrate uncheatable data possession in peer-to-peer file sharing networks. However, their proposalrequires exponentiation over the entire data file, whichis clearly impractical for the server whenever the file islarge. Shah et al. [11], [12] proposed allowing a TPA tokeep online storage honest by first encrypting the datathen sending a number of pre-computed symmetric-keyed hashes over the encrypted data to the auditor.However, their scheme only works for encrypted files,and auditors must maintain long-term state. Schwarz etal. [21] proposed to ensure static file integrity across mul-tiple distributed servers, using erasure-coding and block-level file integrity checks. We adopted some ideas oftheir distributed storage verification protocol. However,our scheme further support data dynamics and explicitlystudy the problem of misbehaving server identification,while theirs did not. Very recently, Wang et al. [28]gave a study on many existing solutions on remote dataintegrity checking, and discussed their pros and consunder different design scenarios of secure cloud storageservices.

Portions of the work presented in this paper havepreviously appeared as an extended abstract in [1]. Wehave revised the article a lot and add more technicaldetails as compared to [1]. The primary improvementsare as follows: Firstly, we provide the protocol extensionfor privacy-preserving third-party auditing, and discussthe application scenarios for cloud storage service. Sec-ondly, we add correctness analysis of proposed stor-age verification design. Thirdly, we completely redo allthe experiments in our performance evaluation part,which achieves significantly improved result as com-pared to [1]. We also add detailed discussion on thestrength of our bounded usage for protocol verificationsand its comparison with state-of-the-art.

7 CONCLUSIONIn this paper, we investigate the problem of data securityin cloud data storage, which is essentially a distributedstorage system. To achieve the assurances of cloud dataintegrity and availability and enforce the quality ofdependable cloud storage service for users, we proposean effective and flexible distributed scheme with explicitdynamic data support, including block update, delete,and append. We rely on erasure-correcting code in thefile distribution preparation to provide redundancy par-ity vectors and guarantee the data dependability. Byutilizing the homomorphic token with distributed verifi-cation of erasure-coded data, our scheme achieves the in-tegration of storage correctness insurance and data errorlocalization, i.e., whenever data corruption has been de-tected during the storage correctness verification acrossthe distributed servers, we can almost guarantee the

simultaneous identification of the misbehaving server(s).Considering the time, computation resources, and eventhe related online burden of users, we also providethe extension of the proposed main scheme to supportthird-party auditing, where users can safely delegate theintegrity checking tasks to third-party auditors and beworry-free to use the cloud storage services. Throughdetailed security and extensive experiment results, weshow that our scheme is highly efficient and resilientto Byzantine failure, malicious data modification attack,and even server colluding attacks.

ACKNOWLEDGMENTSThis work was supported in part by the US NationalScience Foundation under grant CNS-0831963, CNS-0626601, CNS-0716306, and CNS-0831628.

REFERENCES[1] C. Wang, Q. Wang, K. Ren, and W. Lou, “Ensuring data storage

security in cloud computing,” in Proc. of IWQoS’09, July 2009, pp.1–9.

[2] Amazon.com, “Amazon web services (aws),” Online at http://aws.amazon.com/, 2009.

[3] Sun Microsystems, Inc., “Building customer trust in cloud com-puting with transparent security,” Online at https://www.sun.com/offers/details/sun transparency.xml, November 2009.

[4] M. Arrington, “Gmail disaster: Reports of mass emaildeletions,” Online at http://www.techcrunch.com/2006/12/28/gmail-disasterreports-of-mass-email-deletions/, December2006.

[5] J. Kincaid, “MediaMax/TheLinkup Closes Its Doors,”Online at http://www.techcrunch.com/2008/07/10/mediamaxthelinkup-closes-its-doors/, July 2008.

[6] Amazon.com, “Amazon s3 availability event: July 20, 2008,”Online at http://status.aws.amazon.com/s3-20080720.html, July2008.

[7] S. Wilson, “Appengine outage,” Online at http://www.cio-weblog.com/50226711/appengine outage.php, June 2008.

[8] B. Krebs, “Payment Processor Breach May Be Largest Ever,” On-line at http://voices.washingtonpost.com/securityfix/2009/01/payment processor breach may b.html, Jan. 2009.

[9] A. Juels and J. Burton S. Kaliski, “Pors: Proofs of retrievability forlarge files,” in Proc. of CCS’07, Alexandria, VA, October 2007, pp.584–597.

[10] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peter-son, and D. Song, “Provable data possession at untrusted stores,”in Proc. of CCS’07, Alexandria, VA, October 2007, pp. 598–609.

[11] M. A. Shah, M. Baker, J. C. Mogul, and R. Swaminathan, “Audit-ing to keep online storage services honest,” in Proc. of HotOS’07.Berkeley, CA, USA: USENIX Association, 2007, pp. 1–6.

[12] M. A. Shah, R. Swaminathan, and M. Baker, “Privacy-preservingaudit and extraction of digital contents,” Cryptology ePrintArchive, Report 2008/186, 2008, http://eprint.iacr.org/.

[13] G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik, “Scalableand efficient provable data possession,” in Proc. of SecureComm’08,2008, pp. 1–10.

[14] Q. Wang, C. Wang, J. Li, K. Ren, and W. Lou, “Enabling publicverifiability and data dynamics for storage security in cloud com-puting,” in Proc. of ESORICS’09, volume 5789 of LNCS. Springer-Verlag, Sep. 2009, pp. 355–370.

[15] C. Erway, A. Kupcu, C. Papamanthou, and R. Tamassia, “Dynamicprovable data possession,” in Proc. of CCS’09, 2009, pp. 213–222.

[16] H. Shacham and B. Waters, “Compact proofs of retrievability,” inProc. of Asiacrypt’08, volume 5350 of LNCS, 2008, pp. 90–107.

[17] K. D. Bowers, A. Juels, and A. Oprea, “Proofs of retrievability:Theory and implementation,” in Proc. of ACM workshop on CloudComputing security (CCSW’09), 2009, pp. 43–54.

[18] R. Curtmola, O. Khan, R. Burns, and G. Ateniese, “Mr-pdp:Multiple-replica provable data possession,” in Proc. of ICDCS’08.IEEE Computer Society, 2008, pp. 411–420.

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 14: Towards Secure and Dependable Storage services in cloud computing

14

[19] Y. Dodis, S. Vadhan, and D. Wichs, “Proofs of retrievability viahardness amplification,” in Proc. of the 6th Theory of CryptographyConference (TCC’09), San Francisco, CA, USA, March 2009.

[20] K. D. Bowers, A. Juels, and A. Oprea, “Hail: A high-availabilityand integrity layer for cloud storage,” in Proc. of CCS’09, 2009,pp. 187–198.

[21] T. Schwarz and E. L. Miller, “Store, forget, and check: Usingalgebraic signatures to check remotely administered storage,” inProc. of ICDCS’06, 2006, pp. 12–12.

[22] M. Lillibridge, S. Elnikety, A. Birrell, M. Burrows, and M. Isard, “Acooperative internet backup scheme,” in Proc. of the 2003 USENIXAnnual Technical Conference (General Track), 2003, pp. 29–41.

[23] M. Castro and B. Liskov, “Practical byzantine fault tolerance andproactive recovery,” ACM Transaction on Computer Systems, vol. 20,no. 4, pp. 398–461, 2002.

[24] L. Carter and M. Wegman, “Universal hash functions,” Journal ofComputer and System Sciences, vol. 18, no. 2, pp. 143–154, 1979.

[25] J. Hendricks, G. Ganger, and M. Reiter, “Verifying distributederasure-coded data,” in Proc. of 26th ACM Symposium on Principlesof Distributed Computing, 2007, pp. 139–146.

[26] J. S. Plank and Y. Ding, “Note: Correction to the 1997 tutorial onreed-solomon coding,” University of Tennessee, Tech. Rep. CS-03-504, April 2003.

[27] C. Wang, Q. Wang, K. Ren, and W. Lou, “Privacy-preservingpublic auditing for storage security in cloud computing,” in Proc.of IEEE INFOCOM’10, San Diego, CA, USA, March 2010.

[28] C. Wang, K. Ren, W. Lou, and J. Li, “Towards publicly auditablesecure cloud data storage services,” IEEE Network Magazine,vol. 24, no. 4, pp. 19–24, 2010.

[29] R. C. Merkle, “Protocols for public key cryptosystems,” in Proc. ofIEEE Symposium on Security and Privacy, Los Alamitos, CA, USA,1980.

[30] Q. Wang, K. Ren, W. Lou, and Y. Zhang, “Dependable and securesensor data storage with dynamic integrity assurance,” in Proc. ofIEEE INFOCOM’09, Rio de Janeiro, Brazil, Appril 2009.

[31] J. S. Plank, S. Simmerman, and C. D. Schuman, “Jerasure: Alibrary in C/C++ facilitating erasure coding for storage applica-tions - Version 1.2,” University of Tennessee, Tech. Rep. CS-08-627,August 2008.

[32] M. Bellare, R. Canetti, and H. Krawczyk, “Keying hash functionsfor message authentication,” in Proc. of Crypto’96, volume 1109 ofLNCS. Springer-Verlag, 1996, pp. 1–15.

[33] M. Bellare, O. Goldreich, and S. Goldwasser, “Incrementalcryptography: The case of hashing and signing,” in Proc. ofCRYPTO’94, volume 839 of LNCS. Springer-Verlag, 1994, pp. 216–233.

[34] D. L. G. Filho and P. S. L. M. Barreto, “Demonstrating datapossession and uncheatable data transfer,” Cryptology ePrintArchive, Report 2006/150, 2006, http://eprint.iacr.org/.

Cong Wang received his B.E and M.E degreesfrom Wuhan University, China, in 2004 and2007, respectively. He is currently a Ph.D stu-dent in the Electrical and Computer EngineeringDepartment at Illinois Institute of Technology. Hisresearch interests are in the areas of appliedcryptography and network security, with currentfocus on secure data services in Cloud Comput-ing, and secure computation outsourcing.

Qian Wang received the B.S. degree fromWuhan University, China, in 2003 and the M.S.degree from Shanghai Institute of Microsystemand Information Technology, Chinese Academyof Sciences, China, in 2006, both in ElectricalEngineering. He is currently working towards thePh.D. degree in the Electrical and ComputerEngineering Department at Illinois Institute ofTechnology. His research interests include wire-less network security and privacy, and CloudComputing security.

Kui Ren is an assistant professor in the Elec-trical and Computer Engineering Departmentat Illinois Institute of Technology. He obtainedhis PhD degree in Electrical and Computer En-gineering from Worcester Polytechnic Institutein 2007. He received his B. Eng and M. Engboth from Zhejiang University in 1998 and 2001,respectively. In the past, he has worked as aresearch assistant at Shanghai Institute of Mi-crosystem and Information Technology, ChineseAcademy of Sciences, at Institute for Infocomm

Research, Singapore, and at Information and Communications Uni-versity, South Korea. His research interests include network security& privacy and applied cryptography with current focus on security &privacy in cloud computing, lower-layer attack & defense mechanismsfor wireless networks, and smart grid security and energy efficiency.His research is sponsored by US National Science Foundation andDepartment of Energy. He is a member of IEEE and ACM.

Ning Cao received his B.E. and M.E. degreesfrom Xi’an Jiaotong University, China, in 2002and 2008, respectively. He is currently a PhDstudent in the Electrical and Computer Engi-neering Department at Worcester PolytechnicInstitute. His research interests are in the areasof storage codes, security and privacy in CloudComputing, and secure mobile cloud.

Wenjing Lou earned a BE and an ME in Com-puter Science and Engineering at Xi’an Jiao-tong University in China, an MASc in ComputerCommunications at the Nanyang TechnologicalUniversity in Singapore, and a PhD in Electricaland Computer Engineering at the University ofFlorida. From December 1997 to July 1999, sheworked as a Research Engineer at NetworkTechnology Research Center, Nanyang Techno-logical University. She joined the Electrical andComputer Engineering department at Worcester

Polytechnic Institute as an assistant professor in 2003, where she is nowan associate professor. Her current research interests are in the areas ofad hoc, sensor, and mesh networks, with emphases on network securityand routing issues. She has been an editor for IEEE Transactions onWireless Communications since 2007. She was named Joseph SamuelSatin Distinguished fellow in 2006 by WPI. She is a recipient of theU.S. National Science Foundation Faculty Early Career Development(CAREER) award in 2008.

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.