secure data deduplication with dynamic ownership...

1041-4347 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for moreinformation.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TKDE.2016.2580139, IEEE Transactions on Knowledge and Data Engineering 1

Secure Data Deduplication with DynamicOwnership Management in

Cloud StorageJunbeom Hur, Dongyoung Koo, Youngjoo Shin, and Kyungtae Kang

Abstract—In cloud storage services, deduplication technology is commonly used to reduce the space and bandwidthrequirements of services by eliminating redundant data and storing only a single copy of them. Deduplication is most effectivewhen multiple users outsource the same data to the cloud storage, but it raises issues relating to security and ownership. Proof-of-ownership schemes allow any owner of the same data to prove to the cloud storage server that he owns the data in a robustway. However, many users are likely to encrypt their data before outsourcing them to the cloud storage to preserve privacy, butthis hampers deduplication because of the randomization property of encryption. Recently, several deduplication schemes havebeen proposed to solve this problem by allowing each owner to share the same encryption key for the same data. However, mostof the schemes suffer from security flaws, since they do not consider the dynamic changes in the ownership of outsourced datathat occur frequently in a practical cloud storage service. In this paper, we propose a novel server-side deduplication scheme forencrypted data. It allows the cloud server to control access to outsourced data even when the ownership changes dynamicallyby exploiting randomized convergent encryption and secure ownership group key distribution. This prevents data leakage notonly to revoked users even though they previously owned that data, but also to an honest-but-curious cloud storage server. Inaddition, the proposed scheme guarantees data integrity against any tag inconsistency attack. Thus, security is enhanced in theproposed scheme. The efficiency analysis results demonstrate that the proposed scheme is almost as efficient as the previousschemes, while the additional computational overhead is negligible.

Index Terms—Deduplication, cloud storage, encryption, proof-of-ownership, revocation.

F

1 INTRODUCTION

C LOUD computing provides scalable, low-cost,and location-independent online services ranging

from simple backup services to cloud storage infras-tructures. The fast growth of data volumes stored inthe cloud storage has led to an increased demandfor techniques for saving disk space and networkbandwidth. To reduce resource consumption, manycloud storage services, such as Dropbox [1], Wuala[2], Mozy [3], and Google Drive [4], employ a dedupli-cation technique, where the cloud server stores onlya single copy of redundant data and provides linksto the copy instead of storing other actual copiesof that data, regardless of how many clients ask tostore the data. The savings are significant [5], andreportedly, business applications can achieve disk and

• J. Hur is with the Department of Computer Science and Engineering,Korea University, Seoul, 136-701, Republic of Korea.E-mail: [email protected]

• D. Koo is with the Department of Computer Science and Engineering,Korea University, Seoul, 136-701, Republic of Korea.E-mail: [email protected]

• Y.Shin is with the Department of Convergence Security, KyonggiUniversity, Suwon, 16227, Republic of Korea.E-mail: [email protected]

• K. Kang is with the Department of Computer Science and Engineering,Hanyang University, Ansan, 426-791, Republic of Korea.E-mail: [email protected]

bandwidth savings of more than 90% [6]. However,from a security perspective, the shared usage of users’data raises a new challenge.

As customers are concerned about their privatedata, they may encrypt their data before outsourcingin order to protect data privacy from unauthorizedoutside adversaries, as well as from the cloud serviceprovider [7],[8],[9]. This is justified by current securitytrends and numerous industry regulations such as PCIDSS [10]. However, conventional encryption makesdeduplication impossible for the following reason.Deduplication techniques take advantage of data sim-ilarity to identify the same data and reduce the storagespace. In contrast, encryption algorithms randomizethe encrypted files in order to make ciphertext indis-tinguishable from theoretically random data. Encryp-tions of the same data by different users with differentencryption keys results in different ciphertexts, whichmakes it difficult for the cloud server to determinewhether the plain data are the same and deduplicatethem. Say a user Alice encrypts a file M under hersecret key skA and stores its corresponding ciphertextCA. Bob would store CB , which is the encryptionof M under his secret key skB . Then, two issuesarise: (1) how can the cloud server detect that theunderlying file M is the same, and (2) even if it candetect this, how can it allow both parties to recoverthe stored data, based on their separate secret keys?



Straightforward client side encryption that is secureagainst a chosen-plaintext attack with randomly cho-sen encryption keys prevents deduplication [11],[12].One naive solution is to allow each client to encryptthe data with the public key of the cloud storageserver. Then, the server is able to deduplicate theidentified data by decrypting it with its private keypair. However, this solution allows the cloud storageserver to obtain the outsourced plain data, which mayviolate the privacy of the data if the cloud servercannot be fully trusted [13],[14].

Convergent encryption [15] resolves this problemeffectively. A convergent encryption algorithm en-crypts an input file with the hash value of the inputfile as an encryption key. The ciphertext is given to theserver and the user retains the encryption key. Sinceconvergent encryption is deterministic1, identical filesare always encrypted into identical ciphertext, regard-less of who encrypts them. Thus, the cloud storageserver can perform deduplication over the ciphertext,and all owners of the file can download the ciphertext(after the proof-of-ownership (PoW) process option-ally) and decrypt it later since they have the sameencryption key for the file. Convergent encryptionhas long been studied in commercial systems and hasdifferent encryption variants for secure deduplication[8],[16],[17],[18], which was formalized as message-locked encryption later in [20]. However, convergentencryption suffers from security flaws with regard totag consistency and ownership revocation.

As an example of the tag consistency attack issue,suppose Alice and Bob have the same data M , andAlice generates ciphertext CA from M , and then ma-liciously generates another ciphertext C ′

A from M ′(̸=M). Next, she uploads C ′

A with an honestly generatedtag T (CA) = H(M) for a cryptographic hash functionH , which plays the role of data index. When Bobgenerates ciphertext CB from M and tries to uploadCB , the cloud server checks T (CA) = T (CB). Then, itdeletes CB and keeps only C ′

A. Afterwards, when Bobdownloads and decrypts it, the data would be M ′, notM , which means the integrity of his data has beencompromised. Recently, message-locked encryption(MLE) [20] and leakage-resilient deduplication [19]schemes have been proposed to solve this problemby introducing additional integrity check phase fordecrypted data.

In the case of ownership revocation, suppose mul-tiple users have ownership of a ciphertext outsourcedin cloud storage. As time elapses, some of these usersmay request the cloud server to delete or modifytheir data, and then, the server deletes the owner-ship information of the users from the ownershiplist for the corresponding data. Then, the revokedusers should be prevented from accessing the data

1. Convergent encryption exploits a block cipher as an encryptionprimitive.

stored in the cloud storage after the deletion ormodification request (forward secrecy). On the otherhand, when a user uploads data that already exist inthe cloud storage, the user should be deterred fromaccessing the data that were stored before he obtainedthe ownership by uploading it (backward secrecy)2.These dynamic ownership changes may occur veryfrequently in a practical cloud system, and thus, itshould be properly managed in order to avoid thesecurity degradation of the cloud service. However,the previous deduplication schemes could not achievesecure access control under a dynamic ownershipchanging environment, in spite of its importance tosecure deduplication, because the encryption key isderived deterministically and rarely updated afterthe initial key derivation. Therefore, for as long asrevoked users keep the encryption key, they can accessthe corresponding data in the cloud storage at anytime, regardless of the validity of their ownership.This is the problem we attempt to solve in this study.

1.1 ContributionWe propose a deduplication scheme over encrypteddata. The proposed scheme ensures that only autho-rized access to the shared data is possible, which isconsidered to be the most important challenge forefficient and secure cloud storage services [22] in theenvironment where ownership changes dynamically.It is achieved by exploiting a group key managementmechanism in each ownership group. As comparedto the previous deduplication schemes over encrypteddata, the proposed scheme has the following advan-tages in terms of security and efficiency.

First, dynamic ownership management guaranteesthe backward and forward secrecy of deduplicateddata upon any ownership change. As opposed to theprevious schemes, the data encryption key is updatedand selectively distributed to valid owners upon anyownership change of the data through a statelessgroup key distribution mechanism using a binary tree.The ownership and key management for each usercan be conducted by the semi-trusted cloud serverdeployed in the system. Thus, the proposed schemedelegates the most laborious tasks of ownership man-agement to the cloud server without leaking anyconfidential information to it, rather than to the users.Second, the proposed scheme ensures security in thesetting of PoW by introducing a re-encryption mech-anism that uses an additional group key for dynamicownership group. Thus, although the encryption key(that is the hash value of the file) is revealed in thesetting of PoW, the privacy of the outsourced datais still preserved against outside adversaries, whilededuplication over encrypted data is still enabled anddata integrity against poison attacks is guaranteed.

2. A multimedia streaming service based on a pay-as-you-gopolicy in the cloud is a good example of services that have thesebackward and forward security requirements.



1.2 Organization

The rest of the paper is organized as follows. InSection 3, the system architecture and security require-ments are described. In Section 4, the cryptographicbackground is provided and the general frameworkof deduplication over encrypted data is defined. InSection 5, we propose our scheme’s construction. Weanalyze the efficiency and security of the proposedscheme in Section 6 and 7, respectively. In Section 8,we conclude the paper.

2 RELATED WORK

Deduplication techniques can be categorized into twodifferent approaches: deduplication over unencrypteddata and deduplication over encrypted data. In theformer approach, most of the existing schemes havebeen proposed in order to perform a PoW process inan efficient and robust manner, since the hash of thefile, which is treated as a “proof” for the entire file,is vulnerable to being leaked to outside adversariesbecause of its relatively small size. Whereas, in thelatter approach, data privacy is the primary securityrequirement to protect against not only outside ad-versaries but also inside the cloud server. Thus, mostof the schemes have been proposed to provide dataencryption, while still benefiting from a deduplica-tion technique, by enabling data owners to share theencryption keys in the presence of the inside andoutside adversaries. Since encrypted data are givento a user, data access control can be additionallyimplemented by selective key distribution after thePoW process. However, not much work has yet beendone to address dynamic ownership management andits related security problem.

2.1 Deduplication over Unencrypted Data

Harnick et al. [11] demonstrated how data dedupli-cation technique can be used as a side channel thatreveals information to malicious users about the con-tents of files of other users. On the basis of Harnick etal.’s study, Halevi et al. [21] also introduced a similarattack scenario on cloud storage that uses deduplica-tion across multiple users. Specifically, when an at-tacker temporarily compromises a server and obtainsthe hash values for data in the cloud storage, he isable to download all these data. This is because onlya small piece of information about the data, namely, itshash value, serves as not only an index of the data tolocate information of the data among a huge numberof files, but also a “proof” that anyone who knows thehash value owns the corresponding data. Therefore,any users who can obtain the short hash value forspecific data are able to access all the data stored inthe cloud storage.

Harnik et al. [11] proposed a randomized thresholdto avoid an attack on cloud storage services that

use server-side data deduplication by stopping datadeduplication. However, their method did not employclient-side data possession proofs to prevent hashmanipulation attacks. Mulazzani et al. [22] demon-strated the hash manipulation attack and conducted apractical evaluation of such an attack in Dropbox [1],which is one of the biggest cloud storage providers.Specifically, the authors showed that spoofing thehash value of a file chunk added to the local Dropboxfolder allows a malicious user to access files of otherDropbox users, given that the SHA-256 hash valuesof the file’s chunks are known to the attacker.

To overcome these attacks, Halevi et al. [21] in-troduced and formalized the notion of proof-of-ownership (PoW), where a user proves to a serverthat he holds a file using Merkle trees, rather thanonly a short hash value for it. Specifically, Halevi etal.’s scheme encodes a file using an erasure code thatis resilient to the erasure of up to α faction of thebits, and then, builds a Merkle tree over the encodedfile. Then, a challenge-response protocol between theserver and the client verifies the ownership. PoW isclosely related to proof of retrievability [23] and proofof data possession [24]. However, proof of retrievabil-ity and data possession often use a pre-processingstep that cannot be used in the data deduplicationprocedure.

Despite their significant benefits in terms of savingresources, these deduplication schemes may cause an-other security vulnerability and reveal users’ privatedata, in particular, when partial information of users’data has already been leaked. Additionally, all of theabove deduplication schemes allow the cloud serverto store the data in plaintext form and send plaindata to users on receipt of request messages afterthe PoW procedure. Thus, the cloud server should befully trusted by all the users in the systems, whichconstitutes a the significant security threat in the prag-matic cloud storage services where the cloud servermay learn every customer’s private information andmaliciously exploit it.

2.2 Deduplication over Encrypted Data

In order to preserve data privacy against inside cloudserver as well as outside adversaries, users maywant their data encrypted. However, conventionalencryption under different users’ keys makes cross-user deduplication impossible, since the cloud serverwould always see different ciphertexts, even if thedata are the same, regardless of whether the encryp-tion algorithm is deterministic.

Convergent encryption, introduced by Douceur etal. [15], is a promising solution to this problem. Inconvergent encryption, a data owner derives an en-cryption key K ← H(M), where M is data or a file tobe encrypted and H is a cryptographic hash function.Then, he computes the ciphertext C ← E(K,M) via a



block cipher E, deletes M , and keeps only K afteruploading C to the cloud storage. If another userencrypts the same message, the same ciphertext C isproduced since encryption is deterministic. Thus, onreceipt of C from other users after the initial upload,the server does not store the file but instead updatesmeta-data to indicate it has an additional owner. Ifany legitimate owners request and download C later,they can decrypt it with K.

However, convergent encryption suffers from thefollowing security flaw. Suppose a user ua has dataMa and a user ub has other data Mb ̸= Ma.ua uploads maliciously-generated ciphertext Ca ←E(H(Mb),Ma), and its tag (or, index) T (Ca) ←H(E(H(Mb),Mb)). Then, when ub tries to uploadCb ← E(H(Mb),Mb) and its tag, the server sees atag match T (Ca) = T (Cb). Thus, the server deletesCb and keeps only Ca. Later, when ub downloads it,the decryption constitutes Ma, not Mb, meaning theintegrity of the data has been compromised. This isreferred to as the tag consistency problem [20]. Xu etal. [19] also introduced a similar data integrity attackin the cloud storage service, called a poison attack.

In order to solve this problem, Bellare et al. [20]introduced a message-locked encryption (MLE) con-cept and its security notion, and proposed random-ized convergent encryption as one implementationof MLE. In randomized convergent encryption, aninitial uploader encrypts a message and generatesC1 ← E(L,M), where L is a randomly chosen key,and then encrypts the message encryption key L andgenerate C2 ← L⊕K, where K is a key-encrypting key(KEK) that is derived from the message (K ← H(M)).Then, the message tag T is generated from the KEK,not from the ciphertext (T ← H(P,K), where P is aset of public parameters). When any legitimate ownerreceives C1, C2, T from the server later, he computesL← C2⊕K, decrypts C1 with L, and obtains M . Then,he generates a tag T ′ ← H(P,H(P,M)) and checkswhether T ′ = T . If T ′ = T , he accepts it; else, rejectsit, since the data is compromised. In the scheme, C2 isused to distribute the message encryption key, whereK is used as a group KEK shared among owners ofthe same data. Since a tag is generated from the KEK,not from the ciphertext, even different ciphertextsencrypted under the different keys of each ownercan be deduplicated provided that the plaintext is thesame. Xu et al. [19] also proposed a leakage-resilientdeduplication scheme to resolve the data integrityproblem. This scheme also enables the data ownerto encrypt data with a randomly selected key. Then,the data encryption key is encrypted under a KEKderived from the data and distributed to the otherdata owners after the PoW process. If a legitimateowner receives a ciphertext, he can check the integrityof the data by decrypting the data encryption key withthe same KEK.

Convergent encryption is insecure in the setting

of PoW, where the hash value of the file (that is, adeterministic encryption key) may be leaked [19],[21].Unfortunately, this is also the case in MLE [20] andXu et al.’s schemes [19]. Since the hash value ofthe file is used as the KEK in both schemes, if theKEK is revealed, adversaries who obtain it are ableto decrypt the key encryption message and obtainthe encryption key, even if the encryption key is notdeterministic. Another drawback in both schemes isthe lack of dynamic ownership management amongthe data owners. For example, suppose a group ofusers share data in the cloud storage. Some users mayrequest data deletion or modification in the storage.Then, they should be prevented from accessing theoriginal data after this time instance (forward secrecy).Likewise, when a user subsequently uploads the data,access right to the previous data should not be givento him before that time instance (backward secrecy).However, in both schemes, this unauthorized dataaccess cannot be controlled, since the data encryptionkey cannot be updated at all after its initial selectionand distribution by an initial uploader.

Recently, Li et al. [25] proposed a convergent keymanagement scheme in which users distribute theconvergent key shares across multiple servers by ex-ploiting the Ramp secret sharing scheme [26]. Li etal. [27] also proposed an authorized deduplicationscheme in which differential privileges of users, aswell as the data, are considered in the deduplica-tion procedure in a hybrid cloud environment. Jinet al. [31] proposed an anonymous deduplicationscheme over encrypted data that exploits a proxy re-encryption algorithm. Bellare et al. [28] proposed aserver-aided MLE which is secure against brute-forceattack, which was recently extended to interactiveMLE [29] to provide privacy for messages that areboth correlated and dependent on the public systemparameters. However, these schemes do not handlethe dynamic ownership management issues involvedin secure deduplication for shared outsourced data.

Shin et al. [30] proposed a deduplication schemeover encrypted data that uses predicate encryption.This approach allows deduplication only of files thatbelong to the same user, which severely reduces theeffect of deduplication. Thus, in this paper, we fo-cus on deduplication across different users such thatidentical files from different users are detected anddeduplicated safely to provide more storage savings.

3 DATA DEDUPLICATION ARCHITECTURE

In this section, we describe the data deduplicationarchitecture and define the security model. Accord-ing to the granularity of deduplication, deduplica-tion schemes are categorized into (coarse-grained)file-level or (fine-grained) block-level schemes. Sinceblock-level deduplication can easily be deduced fromfile-level deduplication, we consider only file-level



Fig. 1. Architecture of a data deduplication system

deduplication for simplicity’s sake. Thus, a data copyrefers to a whole file in this paper.

3.1 System Description and AssumptionsFig. 1 shows the architecture of the data deduplicationsystem, which consists of the following entities.

1) Data owner: This is a client who owns data,and wishes to upload it into the cloud storageto save costs. A data owner encrypts the dataand outsources it to the cloud storage with itsindex information, that is, a tag. If a data owneruploads data that do not already exist in thecloud storage, he is called an initial uploader;if the data already exist, called a subsequentuploader since this implies that other ownersmay have uploaded the same data previously,he is called a subsequent uploader. Hereafter, werefer to a set of data owners who share the samedata in the cloud storage as an ownership group.

2) Cloud service provider: This is an entity thatprovides cloud storage services. It consists ofa cloud server and cloud storage. The cloudserver deduplicates the outsourced data fromusers if necessary and stores the deduplicateddata in the cloud storage. The cloud servermaintains ownership lists for stored data, whichare composed of a tag for the stored data andthe identities of its owners. The cloud servercontrols access to the stored data based on theownership lists and manages (e.g., issues, re-vokes, and updates) group keys for each owner-ship group as a group key authority. The cloudserver is assumed to be honest-but-curious. Thatis, it will honestly execute the assigned tasks inthe system; however, it would like to learn asmuch information about the encrypted contentsas possible. Thus, it should be deterred fromaccessing the plaintext of the encrypted dataeven if it is honest.

3.2 Threat Model and Security Requirements1) Data privacy: Unauthorized users who cannot

prove ownerships should not be able to decryptthe ciphertext stored in the cloud storage. Ad-ditionally, the cloud server is no longer fullytrusted in the system. Thus, unauthorized accessfrom the cloud server to the plaintext of the

encrypted data in the cloud storage should beprevented.

2) Data integrity: The deduplication algorithmshould guarantee tag consistency against anypoison attacks. That is, the deduplication algo-rithm should allow the valid owners to verifythat the data downloaded from the cloud storagehave not been altered.

3) Backward and forward secrecy3: In the contextof deduplication, backward secrecy means thatany user should be prevented from accessing theplaintext of the outsourced data before upload-ing the data. Conversely, forward secrecy meansthat any user who deletes or modifies the datain the cloud storage should be prevented fromaccessing the outsourced data after its deletionor modification.

4) Collusion resistance: Unauthorized users whodo not have valid ownerships of data in thecloud storage should not be able to decrypt themeven if they collude.

4 PRELIMINARIES AND DEFINITION

4.1 NotationsIn this paper, x $← S denotes the operation of selectingan element x at random and uniformly from a finiteset S and assigning it to x. For an algorithm A,y ← A(x1, . . .) denotes running A on inputs x1, . . .and assigning the output to the variable y. 1λ denotesa string of λ ones, if λ ∈ N, which is the securityparameter4. For two bit-strings a and b, we denote bya||b their concatenation.

Let U = {u1, · · · , un} be the universe of users. LetIDt be the identity of a user ut. Let Gi ⊂ U be aset of users that owns the data Mi, which is referredto as an ownership group. Let Li = ⟨Ti, Gi⟩ be anownership list for Mi, maintained by the cloud server,which consists of a tag Ti and Gi for Mi. Let KGi bethe ownership group key that is shared among thevalid owners in Gi.

4.2 DefinitionsIn this section, we define a secure deduplicationframework for encrypted data with ownership man-agement capability. The scheme consists of the follow-ing algorithms:

1) KEK$← KEKGen(U): The KEK generation algo-

rithm takes a set of users U as input, and outputs

3. In secure group communication, backward secrecy impliesthat when a member newly joins a multicast group, he should beprevented from learning group communications exchanged beforehe joins the group. Forward secrecy implies when a member leavesa multicast group, he should be prevented from learning groupcommunications exchanged after he leaves the group [35].

4. To be consistent with the standard convention in algorithms,where the running time of an algorithm is measured as a functionof the length of its input, we will provide the adversary and thehonest parties with the security parameter in unary as 1λ.



Randomized

convergent encryption

Re-encryption under

ownership group key

Data privacy

Data integrityData

Backward/forward secrecy

Collusion resistance

Fig. 2. Scheme overview and corresponding security

KEKs for each user in U for secure ownershipgroup key distribution.

2) C$← Encrypt(M, 1λ): The encryption algorithm

is a randomized algorithm that takes as inputdata M and a security parameter λ, and outputsa ciphertext C of the data. C consists of theencrypted message and its tag information forindexing.

3) C ′ $← ReEncrypt(C,G): The re-encryption algo-rithm is a randomized algorithm that takes aciphertext C and an ownership group G, andoutputs a re-encrypted ciphertext C ′. Specifi-cally, it outputs a re-encrypted ciphertext suchthat only valid owners in G can decrypt themessage.

4) M ← Decrypt(C ′,K, PK): The decryption algo-rithm is a deterministic algorithm that takes asinput C ′, message encryption key K, and a setof KEKs PK for encrypting an ownership groupkey GK, and outputs a message M , iff K isderived from M and GK is not revoked for theownership group G (that is, the decryptor is inG) for M .

5 PROPOSED DEDUPLICATION SCHEME

In this section, we propose a secure deduplicationscheme for encrypted data that has dynamic owner-ship management capability. The proposed scheme isconstructed based partially on a randomized conver-gent encryption scheme [20] in order to randomize theencrypted data, which renders the proposed schemesecure against the chosen-plaintext attack while stillallowing deduplication over the data. The proposedscheme is further integrated into the re-encryptionprotocol for owner revocation. The owner revocationis executed by re-encrypting the outsourced ciphertextand selectively distributing the re-encryption key tovalid (that is, not revoked) owners by the cloud server.Fig. 2 shows the overview of the proposed schemeand its corresponding security goals.

To handle dynamic ownership management, thecloud server must obtain the ownership list for eachdata, since otherwise revocation cannot take effect.This setting where the cloud server knows the own-ership list does not violate the security requirements,because it is allowed only to re-encrypt the ciphertextsand can by no means obtain any information aboutthe data encryption key of users.

v1

v2 v

3

v4

v5

v6 v

7

v8

v9

v10

v11

v12

v13

v14

v15

u1

u2

u3

u4

u5

u6

u7 u

8

Fig. 3. KEK tree for ownership group key distribution

5.1 Scheme Construction

Let EK(M) be a symmetric encryption of a messageM under a key K. The simplest implementation is tomake EK : {0, 1}k → {0, 1}k a block cipher, where kis the length of the key K. We additionally employ acryptographic hash function H : {0, 1}∗ → {0, 1}∗ togenerate an encryption key and a tag from a message.

5.1.1 Key GenerationThe cloud server runs KEKGen(U) and generatesKEKs for users in U . First, the cloud server sets abinary KEK tree for the universe of users U , as inFig. 3, which will be used to distribute the ownershipgroup keys to users in U ⊆ U . In the tree, eachnode vj holds a KEK, denoted by KEKj . A user isrepresented by a leaf, and each user maintains theKEKs on the path nodes from its leaf to the root.These are called path keys. For instance, in Fig. 3, u2

stores KEK9,KEK4,KEK2, and KEK1 as its pathkeys PK2. For ut ∈ U , PKt denotes a set of the pathkeys of ut. The KEK tree is constructed by the cloudserver as follows:

1) Every member in U is assigned to a leaf nodeof the tree. Random keys are generated andassigned to each leaf node and internal node.

2) Each member ut ∈ U receives securely the pathkeys PKt from its leaf node to the root node ofthe tree.

Then, the path keys are used as KEKs to encryptthe ownership group keys by the cloud server in thedata re-encryption phase. The key assignments in thismethod are conducted randomly and independentlyof each other.

5.1.2 Data EncryptionWithout loss of generality, we suppose a data ownerut wants to upload his data Mi to the cloud storage.ut encrypts the data by running the Encrypt(Mi, 1λ)algorithm.

The algorithm chooses a random data encryptionkey L

$← {0, 1}k(λ), where k(λ) is an algorithm thatdetermines the size of the encryption key under the



security parameter. It also computes a key Ki ←H(Mi) from Mi, which will be used as a KEK forencrypting a message encryption key L, and a tagTi ← H(Ki), which is an index information for thedata. Then, the algorithm encrypts the data and theencryption key as C1

i ← EL(Mi) and C2i ← L ⊕ Ki,

and constructs the ciphertext Ci = C1i ||C2

i .After the construction of Ci, the data owner ut

sends upload||Ti||Ci||IDt to the cloud storage5. Then,the owner deletes Mi and retains only Ki for storagesaving. On its receipt, the cloud server inserts IDt intoGi, creates Li = ⟨Ti, Gi⟩, and stores Ci in the cloudstorage, if ut is the first uploader for Mi. If Li alreadyexists (which means ut is a subsequent uploader), thenit inserts only IDt into Gi without storing Ci.

5.1.3 Data Re-encryptionBefore distributing the ciphertext Ci, the cloud serverre-encrypts it by running ReEncrypt(Ci, Gi) usingthe ownership group information for the ciphertext.The re-encryption algorithm enforces access control ofdynamically changing owners to the outsourced data.

The algorithm progresses as follows:1) For Gi, choose a random ownership group key

GKi. Then, re-encrypt C1i and generate C1

i′=

EGKi(C1i ).

2) Select root nodes of the minimum cover setsin the KEK tree that can cover all of the leafnodes associated with users in Gi. We denoteby KEK(Gi) a set of KEKs that such rootnodes of subtrees for Gi hold. For example,if Gi = {u1, u2, u3, u4, u7, u8} in Fig. 3, thenKEK(Gi) = {KEK2,KEK7}, because v2 andv7 are the root nodes of the minimum coversets that can cover all of the members in Gi. Itfollows that this collection covers all users in Gi

and only them, and any user u /∈ Gi can by nomeans know any KEK in KEK(Gi).

3) Generates

C3i = {EK(GKi)}K∈KEK(Gi).

This encryption is employed as the method fordelivering the ownership group keys to validowners.

On receiving any data request query Ti||IDj froma user uj , the cloud server looks up Li and respondswith Ti||C ′

i to the user, where C ′i = C1

i′||C2

i ||C3i if

IDj ∈ Gi; otherwise, it does nothing. The former in-dicates the case where the user has uploaded the dataand has not been revoked, while the latter indicates

5. In server-side deduplication approaches, the data owner maysend only Ti, and the cloud server requests the owner to uploadthe encrypted data only when there are no data indexed by Ti.This approach can save the network bandwidth; however, it can beused as a side channel that reveals information about the contentsof files of other users. This may violate the privacy of other users[11]. Thus, in the proposed scheme, we assume that the data ownersends Ci as well as Ti in order to preserve privacy.

the case where the user has not uploaded the data orhas been revoked at this moment.

It is important to note that the ownership groupkey distribution protocol through C3

i is a statelessapproach. Thus, even if users cannot update their keystates constantly (e.g., in case of offline), they are ableto decrypt the group key from it at any time theyreceive it when they become online, provided thatthey are not revoked from the ownership groups.

5.1.4 Data DecryptionWhen a user ut receives a ciphertext C ′

i from thecloud server, he can decrypt the message by runningDecrypt(C ′

i, Ki, PKt), if ut ∈ Gi. The data decryptionphase consists of ownership group key decryptionfollowed by the message decryption.

Ownership Group Key Decrypt. When a usersends a data request query and receives Ti||C ′

i fromthe cloud server, he first parses it as Ti, C1

i′, C2

i , C3i ,

and obtains the ownership group key from C3i . If the

user ut has valid ownership (that is, ut ∈ Gi at thistime instance), he can decrypt the ownership groupkey GKi using a KEK ∈ KEK(Gi) ∩ PKt as

GKi = DKEK∈(KEK(Gi)∩PKt)(C3i ).

The user ut may belong to at most one subset rootedby only one such KEK in KEK(Gi). Thus, there canbe only one such KEK.

The key-indistinguishability property follows fromthe fact that no u /∈ Gi is contained in any of thesubsets the root node of which is holding any KEKin KEK(Gi). This means that, for every KEK inKEK(Gi), the KEK is indistinguishable from arandom key, given all the information of all users notin Gi [32]. Thus, any user u /∈ Gi can by no meansdecrypt GKi, even if he colludes with other usersu′ /∈ Gi, which makes the proposed scheme secureagainst such a collusion attack as we will analyze itin Section 7.4.

Message Decrypt. After that, the user ut decryptsthe ciphertext and obtains the message as follows.

C1i ← DGKi(C

1i′), L← C2

i ⊕Ki,

Mi ← DL(C1i ), T ′

i ← H(Mi).

If T ′i ̸= Ti, which represents tag inconsistency, the

user drops the message, since it may be modifiedunder a poison attack; else, the user accepts Mi asthe original data he outsourced. For example, if amalicious user uploads a false message M ′ with a tagH(M) where M ′ ̸= M , a subsequent user who ownsM could detect the pollution of original data M bydecrypting the ciphertext, obtaining M ′, and checkingH(M ′) ̸= H(M). Then, the subsequent user wouldreport the data inconsistency to the cloud server,which may help the cloud server to find and revoke



the malicious user, and delete the polluted data fromthe cloud storage6.

5.2 Key UpdateWhen subsequent users upload data which is thesame as the previously uploaded by the initialuploader, the corresponding ciphertext should bere-encrypted to prevent subsequent users fromaccessing the previous encrypted data in order toprovide backward secrecy. In contrast, when userswho have valid ownerships request the cloud serverto delete or modify the data in the cloud storage,they should be revoked from the ownership list anddeterred from accessing the data after data deletionor modification in order to provide forward secrecy.

Subsequent Upload. We suppose a user us wantsto upload data Mi to the cloud storage, and its cor-responding ownership list Li = ⟨Ti, Gi⟩ and cipher-text Ci = C1

i′||C2

i already exist in the cloud storage(Ci might be encrypted and uploaded by the initialuploader, and re-encrypted by the cloud server suchthat C1

i′= EGKi(C

1i )). Then, the user us encrypts

the data by running the Encrypt(Mi, 1λ) algorithmand generates ciphertext, say C ′′

i . With overwhelmingprobability, it holds that C ′′

i ̸= Ci since the encryptionkey L is randomly selected from {0, 1}k(λ) by differentusers. After the construction of C ′′

i , the data owner us

sends upload||T ′i ||C ′′

i ||IDs to the cloud storage7. Then,the key update and re-encryption processes progressas

1) If T ′i = Ti, the cloud server puts IDs into Gi.

2) The cloud server decrypts the ciphertext com-ponent C1

i′= EGKi(C

1i ) in Ci with the current

ownership group key GKi. Then it selects arandom ownership group key GK ′

i(̸= GKi) andruns the ReEncrypt(Ci, Gi) algorithm describedin Section 5.1.3 with the updated ownershipgroup information Gi and GK ′

i to guaranteebackward secrecy, which updates the ciphertextcomponent as C1

i′: EGKi(C

1i )→ EGK′

i(C1

i ).If there is no Ti in the cloud storage such that

T ′i = Ti, since this implies the first upload, the cloud

server creates a new ownership list for the data,inserts IDs into the newly generated ownershipgroup, and stores the uploaded data in the cloud

6. Even if the proposed scheme can detect any data modificationor loss by a malicious user or CSP, it cannot recover the originaldata under the data loss attack (e.g., by malicious CSP or Byzantinefailure) because all of the redundant data would be deduplicated.One of the approaches for deduplicated data recovery is to adoptinformation dispersal techniques [36],[37],[38] that transform datainto multiple shares with some redundancy and disperse the sharesacross multiple CSPs, which is out of scope in this paper.

7. The subsequent uploader sends C′′i to prevent the side-channel

attack [11] as in the initial upload; however, if communicationchannel is secure in the presence of eavesdroppers, C′′

i does notneed to be uploaded, which will reduce the communication cost asin the client-side deduplication.

storage following the same procedures described inSection 5.1.2.

Data Deletion. When a user us wants to deletedata Mi from the cloud storage, the user sends thedata deletion request messages with delete||Ti||IDs tothe cloud server. Then, the cloud server performs thefollowing procedures.

1) If IDs ∈ Gi, it deletes IDs from Gi. Then, itselects a random ownership group key and runsthe ReEncrypt(Ci, Gi) algorithm described inSection 5.1.3 with the updated ownership groupinformation Gi to guarantee forward secrecy.

2) Else, it does nothing.

Data Modification. When a user us wants to modifythe data Mi to Mj , the user encrypts the data and con-structs the ciphertext Cj and its corresponding tag Tj

by running the Encrypt(Mj , 1λ) algorithm described inSection 5.1.2. Then, the user sends a data modificationrequest message with modify||Ti||Tj ||Cj ||IDs to thecloud server. Then, the cloud server performs thedata deletion procedure, followed by data uploadprocedure, as follows.

- Data Deletion(1–2):1) If IDs ∈ Gi, it deletes IDs from Gi. Then,

it selects a random ownership group key andruns the ReEncrypt(Ci, Gi) algorithm describedin Section 5.1.3 with the updated ownershipgroup information Gi for guaranteeing forwardsecrecy.

2) Else, it does nothing and stops running thealgorithm.

- Data Upload(3–4):3) If there exists Lj = ⟨Tj , Gj⟩ for the tag Tj in

the cloud storage, it performs the subsequentupload procedures described above.

4) Else, if there does not exist Lj in thecloud storage, it performs the initial uploadprocedures described in Section 5.1.2.

When multiple users upload or delete the same fileat the same time, they are handled in a batch way.Specifically, the ownership list for the file is updatedbased on the ownership changes, and the correspond-ing ownership group key and ciphertext are updatedonce. Then, they are securely delivered following theproposed algorithms. We note that this can be handledstraightforwardly without any security degradation.

5.3 Comparison

Table 1 shows the comparison results of the se-cure data deduplication schemes, that is convergentencryption (CE) [15], leakage-resilient (LR) dedupli-cation [19], and randomized convergent encryption(RCE) [20] in terms of the data deduplication over



TABLE 1Comparison of secure deduplication schemes

Scheme Encrypted Tag Ownershipdeduplication consistency management

CE [15] yes no noLR [19] yes yes noRCE [20] yes yes noProposed yes yes yes

encrypted data, tag consistency, and dynamic owner-ship management.

Since all the schemes allow data owners to en-crypt their data and enable deduplication over them,they can guarantee the data confidentiality or privacyagainst the cloud server and unauthorized outsideadversaries. With regard to data integrity, convergentencryption cannot guarantee the integrity of dedupli-cated data in the face of a poison attack, whereas theother schemes preserve it by adopting an additionalmechanism that enables data owners to check the tagconsistency of the received data.

In the proposed scheme, upon every membershipchange in the ownership list (e.g., subsequently up-loading the same data, or modifying/deleting theexisting data), access to the corresponding data is per-mitted to owners only for the time windows duringwhich the owners maintain valid ownership of thedata by re-encrypting it using an updated ownershipgroup key and selectively distributing it. This re-solves the dynamic ownership management problemas opposed to the other schemes. The rekeying inthe proposed scheme can be done immediately uponany ownership change. This enhances the security ofthe outsourced data in terms of backward/forwardsecrecy by reducing the windows of vulnerability. Amore rigorous security analysis is given in Section 7.

6 SCHEME ANALYSIS

In this section, we analyze the efficiency of the pro-posed scheme and compare it with the previous dedu-plication schemes over encrypted data in terms ofboth the theoretical and practical aspects. The effi-ciency of the proposed scheme is demonstrated inthe network simulation in terms of communicationcost. We also discuss its computation cost when im-plemented with specific parameters.

6.1 Efficiency

The comparative results for the theoretical efficiencyof the schemes are summarized in Table 2. In Table2, the analysis results of each scheme in terms ofthe communication and storage overhead are shown.For communication overhead, “upload message size”represents the communication cost required for thedata outsourcing process; “download message size”

represents the communication cost required for ci-phertext downloading and tag checking processes,and “rekeying message size” represents the communi-cation cost required for rekeying the data encryptionkey. For storage overhead, “key size” and “tag size”represent the size of the keys and tag information thateach owner needs to store, respectively.

The notations used in the table are as follows.

CM Size of a data or fileCC Size of an encrypted data

(= output length of E(·))CK Size of a key

(= output length of k(λ) on input 1λ)CT Size of a tag

CID Size of an identity of a user (≥ logn)Cr Size of a node value in Merkle hash tree

CPoW Size of exchanged messages for PoWon inputs the file size and 1λ (= ulogCM ,where u is the smallest integer such that(1− α)u < ε for some constant fractionα > 0) [21]

n Number of users in the systemm Number of owners in an ownership list

for a file

For the upload and download message sizes, theproposed scheme is the same as the basic RCE [20]scheme. In LR [19], the communication overhead forverifying PoW is additionally included in the down-load message. In the scheme, the PoW verification andtag checking processes are done during the data up-load phase by subsequent owners. However, they canbe executed during the data download phase withoutloss of functionality and efficiency. Thus, we supposethey are executed during the download phase as inRCE [20] and the proposed scheme for the sake offair comparison.

With regard to the rekeying message size, onlythe proposed scheme supports key updates uponownership changes for data. In the proposed scheme,the rekeying message size (i.e., size of C3

i ) wouldbe (n −m)log n

n−mCk. This additional message playsan important role in enhancing the backward andforward secrecy, and enforces fine-grained user accesscontrol to the outsourced data in contrast to the otherschemes. Whereas, in CE [15], the encryption key isdetermined by the message itself; in LR [19] and RCE[20], it is selected by the initial uploader and neverupdated during the lifetime of the data in the system.Thus, even if the other schemes do not need the addi-tional rekeying messages, they cannot guarantee thedata privacy during the windows of vulnerability inthe practical cloud environment where the ownershipchanges dynamically as time elapses.

With regard to storage overhead, in the LR scheme,each owner stores the leaf node values of the Merkletree for PoW in addition to the data encryption keyand KEK, the size of which increases in proportionto the data size. In the proposed scheme, each data



TABLE 2Efficiency comparison

Scheme Communication overhead Storage overheadUpload message size Download message size Rekeying message size Key size Tag size

CE [15] CC + CT + CID CC CK CT

LR [19] CC + 3CK + CT + Cr + CID CC(+CPoW + CK + CT )† 2CK + CM · Cr CT

RCE [20] CC + CK + CT + CID CC + CK + CT CK CT

Proposed CC + CK + CT + CID CC + CK + CT (n−m)log nn−m

Ck (logn+ 1)CK CT

†: communication overhead to check tag consistency

owner stores logn additional KEKs as compared tothe original RCE scheme. These KEKs allow the secureand selective distribution of the dynamically updateddata encryption key, which supports fine-grained ac-cess control on the basis of the valid ownership ofeach user with a little storage overhead.

6.2 Simulation

In this simulation, we measure the communicationcost of the deduplication schemes. We consider the on-line cloud storage systems connected to the Internet.Almeroth et al. [33] demonstrated the group behaviorin the Internet’s multicast backbone network (MBone).They showed that the number of users joining amulticast group follows a Poisson distribution withrate λ, and the membership duration time followsan exponential distribution with a mean duration1/µ. Since each owner group of data or files canbe seen as an independent network group, wherethe owners in the group share common outsourceddata, we show the simulation results following thisprobabilistic behavior distribution [33].

We suppose that user join and leave events areindependently and identically distributed in eachownership group following Poisson distribution. Theownership duration time for outsourced data (that is,from data upload time to deletion time) is assumed tofollow an exponential distribution. We set the inter-upload time between users as 12 hours (λ = 2)and the average ownership duration time as 10 days(1/µ = 10).

6.3 Implementation

Figs. 4 and 5 show the total communication coststhat the cloud server incurs to send on data requestsfrom owners in the secure deduplication schemesthat support tag consistency (that is, LR [19], RCE[20], and the proposed scheme) in a single ownershipgroup during 100 days. In this simulation, we supposethe owners send data request messages immediatelyafter ownership changes in the ownership group tomeasure the communication cost incurred by dynamicownership changes for a fair comparison with regardto the security perspective. The communication cost,which is measured in bytes, includes those of the

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

7

8x 10

8

Time (in days)

Co

mm

unic

atio

n c

ost

(in

byte

s)

RCE

LR

Proposed

Fig. 4. Communication cost in LR, RCE, and theproposed scheme

0 10 20 30 40 50 60 70 80 90 1001.0486

1.0486

1.0486

1.0486

1.0487

1.0487

1.0487

1.0487

1.0487x 10

7

Time (in days)

Co

mm

un

icat

ion

co

st (

in b

yte

s)

RCE

Proposed

Fig. 5. Communication cost in RCE and the proposedscheme

ciphertext and of rekeying messages for non-revokedowners.

In this simulation, we set CM = CC = 10MB for atypical multimedia data (e.g., music file), which is oneof the most commonly used types of data in the cloud.To achieve a 128-bit security level, we set CK = 128bits, CT = 128 bits. Fig. 4 shows the communicationcosts in LR, RCE, and the proposed scheme. In theLR scheme, each owner performs a PoW process onevery data request procedure and receives an updatedencryption key and ciphertext by unicast. This incursmuch more communication overhead than the otherschemes. Thus, RCE and the proposed scheme have



TABLE 3Comparison of computation cost

Operation Time CE [15] LR [19] RCE [20] Proposed scheme(ms) upload download upload download upload download upload download

Hash 0.025 2 5 1 2 2 2 2Data encrypt Enc 1 2 1 1Data decrypt Dec 1 1 1 1Key decrypt 0.129 1

Computation (ms) Enc + 0.050 Dec 2Enc + 0.125 Dec + 0.025 Enc + 0.050 Dec + 0.050 Enc + 0.050 Dec + 0.179

almost the same and relatively negligible communi-cation overhead, as shown in Fig. 4. To provide adetailed comparison of the two schemes, Fig. 5 showsthe communication costs of RCE and the proposedscheme. In RCE, we assume that only the rekeyingcomponent C2 = L ⊕ K is selectively distributed tovalid owners securely by unicast in order to realize adynamic ownership management similar to that of theproposed scheme, which is the most efficient rekeyingscenario in RCE. In this scenario, the communicationoverhead of the proposed scheme is less than thatof RCE by about 300-byte with the same level ofownership management capability8.

Next, we analyze and measure the computation costincurred when a data owner encrypts and decryptsdata during upload and download phases, respec-tively. The computation cost is shown in Table 3 interms of the computation of a cryptographic hashfunction for key generation, tag generation (the hashfunction is also used for key encryption/decryptionin LR [19]), data encryption/decryption, and keydecryption. The comparatively negligible bitwiseexclusive-or operations are ignored in the computa-tion analysis results.

For each operation, we include a benchmark timing.Each cryptographic operation was implemented usingthe Crypto++ library ver. 5.6.2 [34] on a 3.4 GHZ pro-cessor PC. The key parameters were selected to pro-vide a 128-bit security level. The implementation usesan MD5 as a cryptographic hash function to generatea 128-bit key and tag, and an AES with ElectronicCode Book (ECB) mode as an encryption/decryptionfunction. Data encryption and decryption time, de-noted by Enc and Dec, respectively, in Table 3, aremeasured for different data sizes as shown in Fig. 6,which increase in proportion to the size of data.

On the basis of the encryption and decryptiontime, we measured the total computation cost forthe upload and download of each scheme, as shownin Fig. 7 and Fig. 8, respectively. For the uploadprocedure, the proposed scheme requires the samecomputations as the CE and RCE schemes. For thedownload procedure, the proposed scheme needs onemore key decryption operation than does the basic

8. Each detailed communication cost can be found in the supple-mentary file for this paper.

0

5

10

15

20

25

30

35

40

1MB 2MB 4MB 6MB 8MB 10MB

Tim

e (m

s)

Size of Data

Enc time (ms) Dec time (ms)

Fig. 6. Encryption and decryption time

0

10

20

30

40

50

60

70

80


Tim

e (m

s)

Size of data

CE LR RCE Proposed

Fig. 7. Computation time for upload

RCE scheme. However, since the symmetric key sizeis much smaller than the typical data size in the cloud(e.g., document file, or multimedia data), the addi-tional 128-bit key decryption time (i.e., 0.129 ms) inthe proposed scheme would be relatively negligible ascompared to the data decryption time in a pragmaticcloud computing system as depicted in Fig. 8.

The measured computation time for upload anddownload is described in Table 4. More experimentalresults with diverse file size (100KB ∼ 1000MB) canbe found in the supplementary file for this paper.

7 SECURITY

In this section, we prove the security of the pro-posed scheme in terms of the security requirementsdiscussed in Section 3.2, that is data privacy, data in-tegrity, backward and forward secrecy, and collusionresistance.



TABLE 4Upload and down load time (ms)

Data CE [15] LR [19] RCE [20] Proposed schemesize upload download upload download upload download upload download1MB 3.095 3.045 6.215 3.07 3.095 3.095 3.095 3.2242MB 6.103 6.053 12.231 6.078 6.103 6.103 6.103 6.2324MB 11.972 11.922 23.969 11.947 11.972 11.972 11.972 12.1016MB 19.853 19.803 39.731 19.828 19.853 19.853 19.853 19.9828MB 23.691 23.641 47.407 23.666 23.691 23.691 23.691 23.8210MB 34.425 34.375 68.875 34.4 34.425 34.425 34.425 34.554

0

5

10

15

20

25

30

35

40


Tim

e (m

s)

Size of Data

CE LR RCE Proposed

Fig. 8. Computation time for download

7.1 Data Privacy

In our trust model, the cloud server is no longer fullytrusted even if honest. Therefore, plain data shouldbe kept secret from the cloud server as well as fromunauthorized users who cannot prove ownership.

Data privacy for the outsourced data against unau-thorized users who have never owned the data canbe trivially guaranteed. Without loss of generality, wesuppose an unauthorized user requests data for Mi

with a tag Ti and one of the other valid users’ identityIDt ∈ Gi (if IDt /∈ Gi, the cloud server would simplyreject the request), and receives the correspondingciphertext Ti||C1

i′||C2

i ||C3i where C1

i′= EGKi(EL(Mi))

and C2i = L ⊕ Ki. If the user has never owned the

data Mi, it is computationally infeasible to guess Ki

and obtain the data encryption key L because of thecryptographic hash function. On the other hand, ifthe user has once owned the data but does not havevalid ownership at the request time instance, he mayhave knowledge of Ki but can by no means obtainthe corresponding GKi from C3

i , since it is encryptedusing KEKs that the user cannot know. Thus, theunauthorized user cannot obtain Mi from C1

i′ without

GKi, even if he may be able to obtain L using Ki.Another attack scenario can be launched by the

cloud server. In the cloud server’s point of view, theinformation it can learn is the set of ciphertext Ci =C1

i ||C2i with the corresponding Li. Although the cloud

server determines the ownership group key GKi, itis computationally infeasible for the cloud server toguess Ki, since it does not have knowledge of Mi, like

the previous unauthorized attacker who has neverowned the data. Without Ki, it cannot decrypt C1

i′ and

obtain Mi. Therefore, data privacy against the honest-but-curious cloud server and unauthorized users isguaranteed.

7.2 Data IntegrityIn the deduplication scheme, data integrity may bethreatened by a poison attack on tag consistency.Without loss of generality, we suppose an attacker andanother user u have the same data M . The attackermaliciously generates ciphertext C ′ from M ′(̸= M),and uploads it with a tag T generated from M ini-tially.

In the proposed scheme, the poison attack ontag consistency is easily detected, as in the basicRCE scheme. When the user u subsequently requeststhe data and receives the corresponding ciphertextC1||C2||C3 with T , he can obtain the ownership groupkey GK and data encryption key L from C3 andC2, respectively if the user has valid ownership ofthe data. Then, the user decrypts C1 and obtains thedata M ′ using the two encryption keys GK and L,generates the tag T ′ from M ′, and checks whetherT ′ = T . If these tags are not consistent, the user dropsthe message, since it implies the data may have beenmodified during the previous outsourcing procedure.Therefore, the proposed scheme guarantees data in-tegrity against a poison attack on tag consistency.

7.3 Backward and Forward SecrecyWhen a user holds data that has been already up-loaded previously at some time instance and tries toupload them into the cloud storage, the correspond-ing ownership group key is updated independentlyand randomly, say from GK to GK ′, and deliveredsecurely to the valid owners of the data (includingthe user) immediately. In addition, the ciphertextcomponent C1′ = EGK(C1), which was encryptedwith GK, is re-encrypted by the cloud server withthe updated ownership group key GK ′ at the sametime as C1′′ = EGK′(C1). Even if the user has storedthe previous ciphertext before he holds the data,he cannot decrypt the previous ciphertext. This isbecause, even if he is able to obtain encryption keys



L and GK ′ from the current ciphertext, they are ofno use for recovering the desired component C1 forthe previous ciphertext, since it is re-encrypted witha previous GK. Therefore, the backward secrecy ofthe outsourced data is guaranteed in the proposedscheme.

On the other hand, when a user deletes or modifiesthe data at some time instance, the correspondingownership group key is also updated independentlyand randomly, say from GK to GK ′, and deliveredsecurely to the valid ownership group members (ex-cluding the user) immediately. Then, the ciphertextcomponent C1′ = EGK(C1) that was encrypted withGK is re-encrypted by the cloud server with a newownership group key GK ′ at the same time as C1′′ =EGK′(C1). Then, the user cannot decrypt the currentciphertext after his revocation, since he can by nomeans obtain GK ′. Even if the user has recoveredthe data encryption key L before he was revokedfrom the ownership group and stored it, it wouldbe of no use for obtaining the desired data from C1

in the subsequent ciphertext, since it is re-encryptedwith a new random GK ′. Therefore, the forwardsecrecy of the outsourced data is also guaranteed inthe proposed scheme.

7.4 Collusion ResistanceTo provide collusion resistance, unauthorized userswho have not valid ownerships of cloud data shouldnot be able to decrypt them even if they collude.In the proposed scheme, in order to decrypt the ci-phertext and obtain the plain data, users should haveknowledge of both the data encryption key L and theownership group key GK. Even if some unauthorizedusers may be able to obtain the data encryptionkey9, it is impossible to have the ownership groupkey GK. This is because the KEK assignment forownership group key distribution in the binary KEKtree is information theoretic, that is KEKs are assignedrandomly and independently of each other. Wheneverany ownership change occurs in an ownership group,the ownership group key is rekeyed immediately andthe data are re-encrypted using the updated groupkey. Even if the unauthorized users collude with eachother, they cannot obtain the current ownership groupkey, since none of the KEKs in their path keys inthe KEK tree is used to encrypt and distribute theownership group key. Therefore, the proposed schemeis secure against a collusion attack of the unauthorizedusers.

8 CONCLUSIONDynamic ownership management is an importantand challenging issue in secure deduplication over

9. This may happen when the unauthorized users have possessedthe data at some time instance and stored the derived key encryp-tion key K until the moment of request; or, they could receive itfrom the other colluders.

encrypted data in cloud storage. In this study, weproposed a novel secure data deduplication schemeto enhance a fine-grained ownership management byexploiting the characteristic of the cloud data man-agement system. The proposed scheme features a re-encryption technique that enables dynamic updatesupon any ownership changes in the cloud storage.Whenever an ownership change occurs in the own-ership group of outsourced data, the data are re-encrypted with an immediately updated ownershipgroup key, which is securely delivered only to thevalid owners. Thus, the proposed scheme enhancesdata privacy and confidentiality in cloud storageagainst any users who do not have valid ownershipof the data, as well as against an honest-but-curiouscloud server. Tag consistency is also guaranteed, whilethe scheme allows full advantage to be taken of effi-cient data deduplication over encrypted data. In termsof the communication cost, the proposed schemeis more efficient than the previous schemes, whilein terms of the computation cost, taking additional0.1 − 0.2 ms compared to the RCE scheme, which isnegligible in practice. Therefore, the proposed schemeachieves more secure and fine-grained ownershipmanagement in cloud storage for secure and efficientdata deduplication.

ACKNOWLEDGMENTS

This work was supported by the National ResearchFoundation of Korea(NRF) grant funded by the Koreagovernment(MSIP) (No. 2016R1A2A2A05005402). D.Koo and K. Kang are the corresponding authors ofthis paper.

REFERENCES

[1] Dropbox, http://www.dropbox.com/.[2] Wuala, http://www.wuala.com/.[3] Mozy, http://www.mozy.com/.[4] Google Drive, http://drive.google.com.[5] D. T. Meyer, and W. J. Bolosky, “A study of practical deduplica-

tion,” Proc. USENIX Conference on File and Storage Technolo-gies 2011, 2011.

[6] M. Dutch, “Understanding data deduplication ratios,” SNIAData Management Forum, 2008.

[7] W. K. Ng, W. Wen, and H. Zhu, “Private data deduplicationprotocols in cloud storage,” Proc. ACM SAC’12, 2012.

[8] M. W. Storer, K. Greenan, D. D. E. Long, and E. L. Miller,“Secure data deduplication,” Proc. StorageSS’08, 2008.

[9] N. Baracaldo, E. Androulaki, J. Glider, A. Sorniotti, “Reconcilingend-to-end confidentiality and data reduction in cloud storage,”Proc. ACM Workshop on Cloud Computing Security, pp. 21–32,2014.

[10] P. S. S. Council, “PCI SSC data security standards overview,”2013.

[11] D. Harnik, B. Pinkas, and A. Shulman-Peleg, “Side channelsin cloud services, the case of deduplication in cloud storage,”IEEE Security & Privacy, vol. 8, no. 6, pp. 40–47, 2010.

[12] C. Wang, Z. Qin, J. Peng, and J. Wang, “A novel encryptionscheme for data deduplication system,” Proc. InternationalConference on Communications, Circuits and Ssytems (ICC-CAS), pp. 265–269, 2010.

[13] Malicious insider attacks to rise,http://news.bbc.co.uk/2/hi/7875904.stm



[14] Data theft linked to ex-employees,http://www.theaustralian.com.au/australian-it/data-theftlinked-to-ex-employees/story-e6frgakx-1226572351953

[15] J. R. Douceur, A. Adya, W. J. Bolosky, D. Simon, and M.Theimer, “Reclaiming space from duplicate files in a server-less distributed file system,” Proc. International Conference onDistributed Computing Systems (ICDCS), pp. 617–624, 2002.

[16] P. Anderson, L. Zhang, “Fast and secure laptop backups withencrypted de-duplication,” Proc. USENIX LISA, 2010.

[17] Z. Wilcox-O’Hearn, B. Warner, “Tahoe: the least-authorityfilesystem,” Proc. ACM StorageSS, 2008.

[18] A. Rahumed, H. C. H. Chen, Y. Tang, P. P. C. Lee, J. C. S.Lui, “A secure cloud backup system with assured deletion andversion control,” Proc. International Workshop on Security inCloud Computing, 2011.

[19] J. Xu, E. Chang, and J. Zhou, “Leakage-resilient client-sidededuplication of encrypted data in cloud storage,” ePrint,IACR, http://eprint.iacr.org/2011/538.

[20] M. Bellare, S. Keelveedhi, and T. Ristenpart, “Message-lockedencryption and secure deduplication,” Proc. Eurocrypt 2013,LNCS 7881, pp. 296–312, 2013. Cryptology ePrint Archive,Report 2012/631, 2012.

[21] S. Halevi, D, Harnik, B. Pinkas, and A. Shulman-Peleg, “Proofsof ownership in remote storage systems,” Proc. ACM Confer-ence on Computer and Communications Security, pp. 491–500,2011.

[22] M. Mulazzani, S. Schrittwieser, M. Leithner, and M. Huber,“Dark clouds on the horizon: using cloud storage as attackvector and online slack space,” Proc. USENIX Conference onSecurity, 2011.

[23] A. Juels, and B. S. Kaliski, “PORs: Proofs of retrievability forlarge files,” Proc. ACM Conference on Computer and Commu-nications Security, pp. 584–597, 2007.

[24] G. Ateniese, R. C. Burns, R. Curtmola, J. Herring, L. Kissner, Z.Peterson, and D. Song, “Provable data possession at untrustedstores,” Proc. ACM Conference on Computer and Communica-tions Security, pp. 598–609, 2007.

[25] J. Li, X. Chen, M. Li, J. Li, P. Lee, and W. Lou, “Secure dedupli-cation with efficient and reliable convergent key management,”IEEE Transactions on Parallel and Distributed Sytems, Vol. 25,No. 6, 2014.

[26] G.R. Blakley, and C. Meadows, “Security of Ramp schemes,”Proc. CRYPTO 1985, pp. 242–268, 1985.

[27] J. Li, Y. K. Li, X. Chen, P. Lee, and W. Lou, “A hybridcloud approach for secure authorized deduplication,” IEEETransactions on Parallel and Distributed Systems, Vol. 26, No.5, pp. 1206–1216, 2015.

[28] M. Bellare, S. Keelveedhi, T. Ristenpart, “DupLESS: Server-aided encryption for deduplicated storage,” Proc. USENIXSecurity Symposium, 2013.

[29] M. Bellare, S. Keelveedhi, “Interactive message-locked encryp-tion and secure deduplication,” Proc. PKC 2015, pp. 516–538,2015.

[30] Y. Shin and K. Kim, “Equality predicate encryption for securedata deduplication,” Proc. Conference on Information Securityand Cryptology (CISC-W), pp. 64–70, 2012.

[31] X. Jin, L. Wei, M. Yu, N. Yu and J. Sun, “Anonymous dedu-plication of encrypted data with proof of ownership in cloudstorage,” Proc. IEEE Conf. Communications in China (ICCC),pp.224-229, 2013.

[32] D. Naor, M. Naor, and J. Lotspiech, ”Revocation and tracingschemes for stateless receivers,” Proc. CRYPTO 2001, LectureNotes in Computer Science, vol. 2139, pp. 41–62, 2001.

[33] K. C. Almeroth, and M. H. Ammar, “Multicast group behaviorin the Internet’s multicast backbone (MBone),” IEEE Commu-nication Magazine, vol. 35, pp. 124–129, 1997.

[34] Crypto++ Library 5.6.2, http://www.cryptopp.com/.[35] S. Rafaeli, D. Hutchison, “A Survey of Key Management for

Secure Group Communication,” ACM Computing Surveys, Vol.35, No. 3, pp. 309-329, 2003.

[36] L. Mingqiang, C. Qin, P.P.C. Lee, and J. Li, “ConvergentDispersal: Toward Storage-Efficient Security in a Cloud-of-Clouds,” Proc. USENIX Conference on Hot Topics in Storageand File Systems, 2014.

[37] L. Mingqiang, C. Qin, P.P.C. Lee, “CDStore: Toward Reliable,Secure, and Cost-Efficient Cloud Storage via Convergent Dis-

persal,” Proc. USENIX Annual Technical Conference, pp. 120,2015.

[38] J. Li, X. Chen, X. Huang, S. Tang, Y. Xiang, M. Hassan, andA. Alelaiwi, “Secure Distributed Deduplication Systems withImproved Reliability,” IEEE Transactions on Computer, Vol. 64,No. 2, pp. 3569–3579, 2015.

Junbeom Hur received the B.S. degreefrom Korea University, Seoul, South Korea,in 2001, and the M.S. and Ph.D. degreesfrom KAIST in 2005 and 2009, respectively,all in Computer Science. He was with theUniversity of Illinois at Urbana-Champaignas a postdoctoral researcher from 2009 to2011. He was with the School of ComputerScience and Engineering at the Chung-AngUniversity, South Korea as an Assistant Pro-fessor from 2011 to 2015. He is currently an

Assistant Professor with the Department of Computer Science andEngineering at the Korea University, South Korea. His research in-terests include information security, cloud computing security, mobilesecurity, and applied cryptography.

Dongyoung Koo received the B.S. degreefrom Yonsei University, Seoul, South Korea,in 2009, and the M.S. and Ph.D. degreesfrom KAIST in 2012 and 2016, respectively,all in Computer Science. He is currently anResearch Professor with the Department ofComputer Science and Engineering at theKorea University, South Korea. His researchinterests include information security, securecloud computing, and cryptography.

Youngjoo Shin received the B.S. degreein Computer Science and Engineering fromKorea University, Seoul, Korea in 2006 andthe M.S. and Ph.D. degrees in ComputerScience from KAIST, Daejeon, Korea in 2008and 2014, respectively. He is currently asenior member of engineering staff at theAttached Institute of ETRI, Daejeon, Korea.His research interests include cryptography,network security and cloud computing secu-rity.

Kyungtae Kang received the BS degreein computer science and engineering andthe MS and PhD degrees in electrical engi-neering and computer science, from SeoulNational University, Korea, in 1999, 2001,and 2007, respectively. From 2008 to 2010,he was a postdoctoral research associatewith the University of Illinois at Urbana-Champaign. In 2011, he joined the Depart-ment of Computer Science and Engineering,Hanyang University, Korea, where he is cur-

rently an assistant professor. His research interests include primarilyin systems, such as operating systems, wireless systems, distributedsystems, real-time embedded systems, and interdisciplinary area ofcyber-physical systems. He is a member of the ACM.

secure data deduplication with dynamic ownership...

Documents