optimal authorized data deduplication in … · optimal authorized data deduplication in cloud 169...

6

Click here to load reader

Upload: dangdung

Post on 27-Jul-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OPTIMAL AUTHORIZED DATA DEDUPLICATION IN … · Optimal Authorized Data Deduplication In Cloud 169 D. Minjrola and R. Rajani Deduplication also helps to recover file or entire system

International Journal of Computer Engineering and Applications, Volume VIII, Issue I, Part I,

October 14

168

D. Minjrola and R. Rajani

OPTIMAL AUTHORIZED DATA DEDUPLICATION IN CLOUD

Divyesh Minjrola 1, Rakesh Rajani

2

1 PG Student, Dept of Computer Engineering, Alard college of engineering and management, 2 Professor, Dept of Computer Engineering, Alard college of engineering and management,

Sivitribai Phule Pune University, Pune

ABSTRACT: Cloud technology is widely used as it allows sharing and centralized storage of data, sharing of data processing, online access of computer services and resources on various types of devices. One critical challenge of cloud storage services is the management of the ever-increasing volume of data .To address this data deduplication is one of the novel technique. Deduplication helps to remove or prevent from having duplicate copies of same data. Though deduplication has benefits it adds concerns related to privacy and security of users as it can lead to insider and outsider attacks. Achieving deduplication along with data security in cloud environment makes it more critical. Objective of paper on “Optima Authorized Data Deduplication in Cloud” is to discuss the analysis of deduplication techniques and optimal authorization measures for security along with deduplication in cloud environment.

Keywords: Cloud computing, Deduplication, Convergent Encryption, Proof of Ownership

Protocol, Differential Authorization

[1] INTRODUCTION

Cloud technology is internet based technology, it provides services to user by providing

access to shared and centralized storage of data and processing capabilities in platform

independent manner at low cost. As cloud technology becomes prevalent, the data storage and

sharing also became prevalent. One critical challenge of cloud storage services is the

management of the ever-increasing volume of data. To address this deduplication is novel

technique but it adds privacy concerns.

Here mentioned is analysis of solutions present is various papers to address the problem

of managing increasing volume of data with authentication.

[2] RELATED WORK

Data deduplication technology is method for maximizing the usage of given data

storage. Deduplication identifies similarities among different files to save disk space.

Advantage of deduplication is to minimize storage cost by storing more data on disks.

Deduplication reduce input/output ratio and helps to lower the cost of storage and energy.

Page 2: OPTIMAL AUTHORIZED DATA DEDUPLICATION IN … · Optimal Authorized Data Deduplication In Cloud 169 D. Minjrola and R. Rajani Deduplication also helps to recover file or entire system

Optimal Authorized Data Deduplication In Cloud

169

D. Minjrola and R. Rajani

Deduplication also helps to recover file or entire system at certain point in time. In case of

deduplication on files, entire file is used for validation if any other file with similar data is

present or not. If similar copy is found then another copy of same file is not stored. Advantage

of file level deduplication is it needs less metadata information and comparatively easy to

implement and maintain. In case of chunk level deduplication, file is divided into chunk of

same sizes or various sizes. During deduplication, each chunk is used for validation. If similar

chunk (of same or other file) is found then deduplication only stores a reference to this chunk

instead of its actual contents [Figure 1]

Figure 1: Deduplication for chunks of file

Conventional or traditional encryption does not work for deduplication as user encrypt

their data with their individual keys and thus identical of copy of data will have different cipher

text and deduplication is impossible.

Convergent encryption [2] is widely used to achieve deduplication and decent

confidentiality of data. In convergent key encryption same key is used to encrypt and decrypt

the data as key is generated using crypto graphic hash value of data. Since convergent key is

derived from data, it will generate identical cipher text for similar data. This helps to achieve

deduplication on cloud.

Drawback of convergent encryption is it is subject to brute force attack for data or files

falling into known sets.

In typical storage system with deduplication, first client will only send the hash value of

the file then server will check if that hash value already exists in its database. If file is already

present on the server it asks client not to send the file and marks client additional owner of the

file. Thus client side deduplication leads to security problem and could reveal other client has

same file of sensitive information. This problem can be addressed by proof of ownership

protocol (PoW)[3]. PoW is in two parts and it’s between two players on common input file. In

first step verifier summaries to itself and generate sort information “v”. In later step prover and

verifier engage in interactive protocol where verifier has sort information “v” and prover has

file “F” at the end verifier either accepts o rejects it.

Limitation with PoW protocol is it cannot support differential authorization duplicate

check, which useful in many applications.

In system with authorization and deduplication, user is assigned a set of privileges

when uses are added. Each file added to cloud is also assigned set of privileges that specify

which kind of user is allowed to execute duplicate check and allowed to access the files.

Page 3: OPTIMAL AUTHORIZED DATA DEDUPLICATION IN … · Optimal Authorized Data Deduplication In Cloud 169 D. Minjrola and R. Rajani Deduplication also helps to recover file or entire system

International Journal of Computer Engineering and Applications, Volume VIII, Issue I, Part I,

October 14

170

D. Minjrola and R. Rajani

During duplicate check user needs to take his file and user privileges as inputs. The

duplicate check succeeds only if copy of this file and privileges that matches the user privileges

is stored in cloud. Conventional deduplication system which is based on convergent encryption

or PoW protocol though support confidentiality but does not support duplicate check with

differential privileges.

For example a company decided to use cloud storage for cost saving and effective data

management. In differential authorization and deduplication system each employee accessing

cloud will be assigned privileges and deduplication will be used to store single copy of same

file. For security reason some file will be encrypted and duplicate check on this file will be

allowed to employees with specific privileges.

The problem of deduplication along with differential privileges in cloud computing

environment can be addressed by Hybrid Cloud Approach solution for secure and authorized

deduplication [1]. This system considers multiple cloud architecture consisting of a public and

private cloud. Here private cloud is a proxy cloud to allow data owners and users to securely

perform duplicate using differential privileges. User will store data on public cloud and data

operation will be managed by private cloud. The user will be allowed to perform duplicate

check for all those files with user privileges.

Figure 2: Architecture of authorized deduplication

There are three entities define in hybrid cloud architecture of authorized deduplication.

User: In this system user an entity who wants to store data on cloud. Each user is assigned set

of privileges for example, we may define a role based privilege [9] or according to job positions

(example. Director, technical lead and engineer), or define a time-based privileges that indicate

validity time period. A user, say Ramesh, may be assigned two privileges “technical lead” and

“access right valid till "2015-10-10”, so that Ramesh can access any file whose access role is

“technical lead” and accessible till "2015-10-10". In system with deduplication will not upload

any duplicate data to save bandwidth.

Page 4: OPTIMAL AUTHORIZED DATA DEDUPLICATION IN … · Optimal Authorized Data Deduplication In Cloud 169 D. Minjrola and R. Rajani Deduplication also helps to recover file or entire system

Optimal Authorized Data Deduplication In Cloud

171

D. Minjrola and R. Rajani

Private Cloud: This is new entity for facilitating users secure use of cloud services. The

private keys for privileges are managed by private cloud, which provides the file token to users.

The interface offered by the private cloud allows user to submit files and queries to be securely

stored and computed respectively.

Public cloud: This is an entity that provides data storage service. To reduce storage cost it

reduces redundant data via deduplication.

Below mentioned operations in Hybrid cloud architecture for authorized deduplication

system.

Let P is privilege universe, kpi is symmetric key for each pi P. A set of keys { kpi }

pi P is store on private cloud. Each user has secret key skU.

The private cloud server will have table which maintains and stores each user’s public

information pkU and its corresponding privilege set PU.

In case data owner wants to upload and share a file F with users whose privilege

belongs to the set PF ={pj}. The data owner needs interact with the private cloud before

performing duplicate check with the public cloud. That is the data owner performs an

identification to prove its identity with private key skU. If it is passed, the private cloud server

will find the corresponding privileges PU of the user from its stored table list. The user

computes and sends the file tag ϕF = TagGen(F) to the private cloud server, who will return

{ϕ′F, pT = TagGen(ϕF , kpT )} back to the user for all pT . Then, the user will interact and send

the file token { ϕ′ F,pT } to the public cloud. If a file duplicate is found, the user needs to run

the PoW protocol with the public cloud to prove the file ownership. If the proof is passed, the

user will be provided a pointer for the file.

If no duplicate is found, a proof from the public cloud will be returned, which is also a

signature on { ϕ′ F,pT } , pkU and a time stamp. The user sends the privilege set PF = {pj} for

the file F as well as the proof to the private cloud server. Upon receiving the request, the private

cloud server first verifies the proof from the public cloud. If it is passed, the private cloud

server computes {ϕ′ F,pT = TagGen(ϕF , kpT )} for all pT. Finally, the user computes the

encrypted file CF = EncCE(kF , F) with the convergent key kF = KeyGenCE(F) and uploads {

CF , { ϕ′ F,pT } } with privilege PF.

[3] PROPOSED SYSTEM

The above solution supports the differential privilege duplicate, but it is subject to brute

force attacks and can recover files falling into a known set. This conventional convergent

encryption will be insecure for predictable file.

The idea of proposed technique is to add make system more secure and optimize

deduplication to save more disk space.

Proposed explore the further enhancement in Hybrid Cloud Approach solution for

secure and authorized deduplication [1]. In conventional convergent encryption, to support

duplicate check, the key is derived from the file F by using some cryptographic hash function

Page 5: OPTIMAL AUTHORIZED DATA DEDUPLICATION IN … · Optimal Authorized Data Deduplication In Cloud 169 D. Minjrola and R. Rajani Deduplication also helps to recover file or entire system

International Journal of Computer Engineering and Applications, Volume VIII, Issue I, Part I,

October 14

172

D. Minjrola and R. Rajani

kF = H(F). To avoid the deterministic key generation, the encryption key kF for file F in

proposed system will be generated with the novel encryption algorithm like using multiple

hashing to make system more secure so that both the private cloud server and public cannot

decrypt the cipher text.

Proposed system also aims for block level deduplication in hybrid cloud approach of

differential deduplication. Block level deduplication help in further data storage space

optimization.

To upload a file, a user will first perform the file-level duplicate check.

If the file is a duplicate, then all its blocks will also be duplicate else the user further

performs the block-level duplicate check and identifies the unique blocks to be uploaded.

Based on the optimization mentioned in proposed system, it will make security and

authorization more robust and add optimization for storage.

[4] CONCLUSION

We did analysis of related work to address authorized deduplication in cloud and based

on this analysis proposed system which will achieve data compression and along with optimal

security by means of authorized data deduplication in cloud environment by using multiple

clouds. Proposed system aims to have minimal overhead in solution to address the problem.

Page 6: OPTIMAL AUTHORIZED DATA DEDUPLICATION IN … · Optimal Authorized Data Deduplication In Cloud 169 D. Minjrola and R. Rajani Deduplication also helps to recover file or entire system

Optimal Authorized Data Deduplication In Cloud

173

D. Minjrola and R. Rajani

REFERENCES

[1] Jin Li, Yan Kit Li, Xiaofeng Chen, Patrick P. C. Lee, Wenjing Lou, A Hybrid Cloud

Approach for Secure Authorized Deduplication, IEEE Transactions on Parallel and

Distributed Systems, 2014

[2] J. R. Douceur, A. Adya, W. J. Bolosky, D. Simon, and M. Theimer. Reclaiming space from

duplicate files in a serverless distributed file system. In ICDCS, pages 617–624, 2002.

[3] S. Halevi, D. Harnik, B. Pinkas, and A. Shulman-Peleg. Proofs of ownership in remote

storage systems. In Y. Chen, G. Danezis, and V. Shmatikov, editors, ACM Conference on

Computer and Communications Security, pages 491–500. ACM, 2011

[4] S. Quinlan and S. Dorward. Venti: a new approach to archivalstorage. In Proc. USENIX

FAST, Jan 2002

[5] P. Anderson and L. Zhang. Fast and secure laptop backups withencrypted de-duplication.

In Proc. of USENIX LISA, 2010

[6] M. Bellare, C. Namprempre, and G. Neven. Security proofs for identity-based

identification and signature schemes. J. Cryptology, 22(1):1–61, 2009.

[7] M. Bellare and A. Palacio. Gq and schnorr identification schemes: Proofs of security

against impersonation under active and concurrent attacks. In CRYPTO, pages 162–177,

2002

[8] R. S. Sandhu, E. J. Coyne, H. L. Feinstein, and C. E. Youman.Role-based access control

models. IEEE Computer, 29:38–47, Feb 1996.

[9] D. Ferraiolo and R. Kuhn. Role-based access controls. In 15th

NIST-NCSC National

Computer Security Conf., 1992.

[10] J. Li, X. Chen, M. Li, J. Li, P. Lee, andW. Lou. Secure deduplication with efficient and

reliable convergent key management. In IEEE Transactions on Parallel and Distributed

Systems, 2013.

[11] J. Stanek, A. Sorniotti, E. Androulaki, and L. Kencl. A secure data deduplication scheme

for cloud storage. In Technical Report, 2013.

[12] M. W. Storer, K. Greenan, D. D. E. Long, and E. L. Miller. Secure data deduplication. In

Proc. of StorageSS, 2008

[13] D. Ferraiolo and R. Kuhn. Role-based access controls. In 15th

NIST-NCSC National

Computer Security Conf., 1992.

[14] J. R. Douceur, A. Adya, W. J. Bolosky, D. Simon, and M. Theimer, Reclaiming space from

duplicate files in a serverless distributed file system, in Proc. ICDCS′02, 2002, pp. 617-

624.

[15] D. Harnik, B. Pinkas, and A. Shulman-Peleg, Side channels in cloud services:

deduplication in cloud storage, IEEE Security & Privacy, vol. 8, no. 6, Nov.-Dec. 2010,

pp.40-47