sheltered authorized deduplication in hybrid cloud · pdf file... (secure) authorized...

8

Click here to load reader

Upload: vannguyet

Post on 20-Mar-2018

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Sheltered Authorized Deduplication In Hybrid Cloud · PDF file... (secure) authorized deduplication. ... Sheltered Authorized Deduplication In Hybrid Cloud Design ... and in block

1403

Available online through - http://ijifr.com/searchjournal.aspx Published On: December 29, 2015

Copyright©IJIFR 2015

International Journal of Informative & Futuristic Research ISSN: 2347-1697

Volume 3 Issue 4 December 2015 Research Paper

Abstract

As per current trends users prefer cloud to store their personnel as well as data which they want to share with other. In case of such data storage system some time same type of data is stored by different users. This data duplication causes inefficiency in cloud storage as well as utilization of bandwidth. To make cloud more sensible regarding its storage and bandwidth some techniques are proposed. Data de-duplication is one of the recent technologies / techniques in cloud storage in current market trends that avoid such data duplication caused by privileged as well as non-privileged user. It enables companies, organizations to save a lot of money on data storage, on bandwidth to transact data when replicating it offsite for disaster recovery. The main theme of this paper is to provide sheltered (secure) authorized deduplication. Deduplication is one of the data compression techniques for eliminating duplicate copies of repeating data. We also present several new deduplication techniques supporting authorized duplicate check scheme in hybrid cloud architecture. By using security analysis, we secure our data by implementing a prototype of our proposed authorized duplicate check scheme and conduct test based experiments using our prototype. We will show that our authorized duplicate check scheme incurs minimal overhead compared to convergent encryption and network transfer. It is limited to a specific generic group. In future we can provide complete security using Cipher text-Policy Attribute Encryption algorithm which is not limited to specific group.

Sheltered Authorized Deduplication

In Hybrid Cloud Design Paper ID IJIFR/ V3/ E4/ 065 Page No. 1403-1410 Research Area

Cloud

Computing

Keywords Deduplication, Cloud Storage, Authorized Duplicate Check, Confidentiality,

Hybrid Cloud

1st Adabala P V D L Kumar

Assistant Professor, Department of Computer Engineering BVC Institute of Technology and Science Andhra Pradesh

2nd A S G R Srinivas

Assistant Professor, Department of Computer Engineering BVC Institute of Technology and Science Andhra Pradesh

3rd M. Janaki Ramudu

Assistant Professor, Department of Computer Engineering BVC Institute of Technology and Science Andhra Pradesh

Page 2: Sheltered Authorized Deduplication In Hybrid Cloud · PDF file... (secure) authorized deduplication. ... Sheltered Authorized Deduplication In Hybrid Cloud Design ... and in block

1404

ISSN: 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume - 3, Issue -4, December 2015 Continuous 28th Edition, Page No.:1403-1410

Adabala P V D L Kum ar, A S G R Srinivas, M. Janaki Ram udu :: Sheltered

Authorized Deduplication In Hybrid Cloud Design

1. Introduction

In computing, data deduplication is a specialized data compression technique for eliminating

duplicate copies of repeating data. Related and somewhat synonymous terms are intelligent (data)

compression and single-instance (data) storage. This technique is used to improve storage

utilization and can also be applied to network data transfers to reduce the number of bytes that must

be sent. In the deduplication process, unique chunks of data, or byte patterns, are identified and

stored during a process of analysis. As the analysis continues, other chunks are compared to the

stored copy and whenever a match occurs, the redundant chunk is replaced with a small reference

that points to the stored chunk [2].

Data deduplication takes place in either block level or file level. In file level approach

duplicate files are eliminate, and in block level approach duplicate blocks of data that occur in non-

identical files. Deduplication reduce the storage needs by upto 90-95% for backup application, 68%

in standard file system.. For data confidentiality, encryption is used by different user for encrypt

their files or data, using a secrete key user perform encryption and decryption operation. For

uploading file to cloud user first generate convergent key, encryption of file then load file to the

cloud. To prevent unauthorized access proof of ownership protocol is used to provide proof that the

user indeed owns the same file when deduplication found. After the proof, server provides a pointer

to subsequent user for accessing same file without needing to upload same file. When user want to

download file he simply download encrypted file from cloud and decrypt this file using convergent

key.

Although data deduplication brings a lot of benefits, security and privacy concerns arise as users’

sensitive data are susceptible to both insider and outsider attacks. Traditional encryption, while

providing data confidentiality, is incompatible with data deduplication. Specifically, traditional

encryption requires different users to encrypt their data with their own keys. Thus, identical data

copies of different users will lead to different cipher texts, making deduplication impossible.

Convergent encryption has been proposed to enforce data confidentiality while making

deduplication feasible. It encrypts/ decrypts a data copy with a convergent key, which is obtained by

computing the cryptographic hash value of the content of the data copy. After key generation and

data encryption, users retain the keys and send the cipher text to the cloud. Since the encryption

operation is deterministic and is derived from the data content, identical data copies will generate

the same convergent key and hence the same cipher text. To prevent unauthorized access, a secure

proof of ownership protocol is also needed to provide the proof that the user indeed owns the same

file when a duplicate is found. After the proof, subsequent users with the same file will be provided

a pointer from the server without needing to upload the same file. A user can download the

encrypted file with the pointer from the server, which can only be decrypted by the corresponding

data owners with their convergent keys. Thus, convergent encryption allows the cloud to perform

deduplication on the cipher texts and the proof of ownership prevents the unauthorized user to

access the file.

“Differential authorized de-duplication check” cannot supported by the previous de-

duplication systems. With the authorized de-duplication system, each user issued a set of the

privileges during system initialization. To specify which type of user is allowed to perform the

duplication check and access the files is decided by the uploading each file to the cloud and is also

bounded by the set of privileges. The user have to take the file and the own privileges as inputs, to

submit before of the user duplication check request for the same file. If only, copy of the file and

matched privilege stored in cloud, then only the user gets the duplicate of the same file.

Page 3: Sheltered Authorized Deduplication In Hybrid Cloud · PDF file... (secure) authorized deduplication. ... Sheltered Authorized Deduplication In Hybrid Cloud Design ... and in block

1405

ISSN: 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume - 3, Issue -4, December 2015 Continuous 28th Edition, Page No.:1403-1410

Adabala P V D L Kum ar, A S G R Srinivas, M. Janaki Ram udu :: Sheltered

Authorized Deduplication In Hybrid Cloud Design

2 Literature Review Some existing works in this area is as follows:

Hybrid Cloud is the architecture that provides the Organization to efficiently work on both

the private and public cloud architecture in combination by providing the scalability to adopt. Here

some of the basic concepts and idea proposed by authors and how best and easy to adopt this

environment is explained by Neal Leavitt. [3] An intelligent workload factoring, service for

organization customers which makes the best use of the present public Cloud services including

their private owned data centers. It allows the organization to work between the off-premises and

the on-premises infrastructure. The efficient core technology that is used for intelligent workload

factoring is a fast redundant data element detection algorithm, that helps us factoring all the

incoming requests based on the data content and not only on volume of data, Hui Zhang, Guofei

Jiang, Kenji Yoshihira, Haifeng Chen and Akhilesh Saxena. [4] The term ―Cloud has many

definitions one among them is to provide infrastructure as a service system where the IT

infrastructure will be deployed in the particular cloud service provider, data center as virtual

machine. The growing popularity of laas will help us to transform the organization present

infrastructure into the required hybrid cloud or private cloud. OpenNebula Concept is being used

that will provide the features that are not present in any other cloud software, Borja Sotomayor

,Rubén S. Montero and Ignacio M. Llorente, Ian Foster. [5] Data Deduplication is a technique that

is mainly used for reducing the redundant data in the storage system which will unnecessarily use

more bandwidth and network. So here some common technique is being defined which finds the

hash for the particular file and with that the process of deduplication can be simplified, David Geer.

[6] De-duplication is the technique that is most effective most widely used but when it is applied

across the multiple users the cross-user deduplication tend to have to many serious privacy

implications.

3. System Model Hybrid Architecture for Secure Deduplication

By using the duplication technique, to store the data that will use S-CSP are consisted as

group of affiliated client at high level. The main aim is enterprise all the network. To set the data

back up and disaster recovery applications for reduce the storage space. We frequently go for de-

duplication. Such systems are widespread and are often more suitable to user file backup and

synchronization applications than richer storage abstractions.

Figure 1: Architecture for Authorized Deduplication.

Page 4: Sheltered Authorized Deduplication In Hybrid Cloud · PDF file... (secure) authorized deduplication. ... Sheltered Authorized Deduplication In Hybrid Cloud Design ... and in block

1406

ISSN: 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume - 3, Issue -4, December 2015 Continuous 28th Edition, Page No.:1403-1410

Adabala P V D L Kum ar, A S G R Srinivas, M. Janaki Ram udu :: Sheltered

Authorized Deduplication In Hybrid Cloud Design

There are three entities define in our system as shown in figure 1. Those are,

S-CSP in public cloud

Users

Private cloud

S-CSP: This is an entity that provides a data storage service in public cloud. The S-CSP

provides the data outsourcing service and stores data on behalf of the users. To reduce the storage

cost, the S-CSP eliminates the storage of redundant data via de-duplication and keeps only unique

data. In this paper, we assume that S-CSP is always online and has abundant storage capacity and

computation power.

Data Users: A user is an entity that wants to outsource data storage to the S-CSP and access the

data later. In a storage system supporting de-duplication, the user only uploads unique data but does

not upload any duplicate data to save the upload bandwidth, which may be owned by the same user

or different users. In the authorized de-duplication system, each user is issued a set of privileges in

the setup of the system. Each file is protected with the convergent encryption key and privilege

keys to realize the authorized de-duplication with differential privileges.

Private Cloud: Compared with the traditional de-duplication architecture in cloud computing,

this is a new entity introduced for facilitating user’s secure usage of cloud service[7]. Specifically,

since the computing resources at data user/owner side are restricted and the public cloud is not fully

trusted in practice, private cloud is able to provide data user/owner with an execution environment

and infrastructure working as an interface between user and the public cloud. The private keys for

the privileges are managed by the private cloud, who answers the file token requests from the users.

The interface offered by the private cloud allows user to submit files and queries to be securely

stored and computed respectively.

4. Implementation

We implement a prototype of the proposed authorized deduplication system, in which we

model three entities as separate C++ programs. A Client program is used to model the data users to

carry out the file upload process. A Private Server program is used to model the private cloud

which manages the private keys and handles the file token computation. A Storage Server program

is used to model the S-CSP which stores and deduplicate files. Our implementation of the Client

provides the following function calls to support token generation and deduplication along the file

upload process.

FileTag(File) - It computes SHA-1 hash of the File as File Tag;

TokenReq(Tag, UserID) - It requests the Private Server for File Token generation with the File

Tag and User ID;

DupCheckReq(Token) - It requests the Storage Server for Duplicate Check of the File by sending

the file token received from private server;

ShareTokenReq(Tag, {Priv.}) - It requests the Private Server to generate the Share File Token

with the File Tag and Target Sharing Privilege Set;

FileEncrypt(File) - It encrypts the File with Convergent Encryption using 256-bit AES algorithm

in cipher block chaining (CBC) mode, where the convergent key is from SHA-256 Hashing of the

file;

Page 5: Sheltered Authorized Deduplication In Hybrid Cloud · PDF file... (secure) authorized deduplication. ... Sheltered Authorized Deduplication In Hybrid Cloud Design ... and in block

1407

ISSN: 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume - 3, Issue -4, December 2015 Continuous 28th Edition, Page No.:1403-1410

Adabala P V D L Kum ar, A S G R Srinivas, M. Janaki Ram udu :: Sheltered

Authorized Deduplication In Hybrid Cloud Design

FileUploadReq(FileID, File, Token) – It uploads the File Data to the Storage Server if the file is

Unique and updates the File Token stored. Our implementation of the Private Server includes

corresponding request handlers for the token generation and maintains a key storage with Hash

Map.

TokenGen(Tag, UserID) - It loads the associated privilege keys of the user and generate the token

with HMAC-SHA-1 algorithm.

TokenReq(Tag, UserID) - It requests the Private Server for File Token generation with the File

Tag and User ID;

DupCheckReq(Token) - It requests the Storage Server for Duplicate Check of the File by sending

the file token received from private server;

ShareTokenReq(Tag, {Priv.}) - It requests the Private Server to generate the Share File Token

with the File Tag and Target Sharing Privilege Set;

FileEncrypt(File) - It encrypts the file with Convergent Encryption using 256-bit AES algorithm in

cipher block chaining (CBC) mode, where the convergent key is from SHA-256 Hashing of the file;

and

FileUploadReq(FileID, File, Token) – It uploads the file data to the storage server if the file is

unique and updates the file token stored. Our implementation of the Private Server includes

corresponding request handlers for the token generation and maintains a key storage with Hash

Map.

TokenGen(Tag, UserID) - It loads the associated privilege keys of the user and generate the token

with HMAC-SHA-1 algorithm; and

ShareTokenGen(Tag, {Priv.}) - It generates the share token with the corresponding privilege keys

of the sharing privilege set with HMAC-SHA-1 algorithm.

Operations performed on Hybrid Cloud

File Uploading: When user want to upload the file to the public cloud then user first encrypt the

file which is to be upload by make a use of the symmetric key, and send it to the Public cloud. At

the same time user generates the key for that file and sends it to the private cloud. In this way user

can upload the file in to the public cloud.

File Downloading: When user wants to download the file that he/she has upload on the public

cloud and make a request to the public cloud, then public cloud provide a list of files that many

users are upload on it. Among that user select one of the file form the list of files and enter the

download option, at that time private cloud sends a message that enter the key for the file generated

by the user, then user enters the key for the file that he/she is generated, then private cloud checks

the key for that file and if the key is correct that means the user is valid then only user can

download the file from the public cloud otherwise user can’t download the file. When user

download the file from the public cloud it is in the encrypted format then user decrypt that file by

using the same symmetric key.

5. Evaluations Our evaluation focuses on comparing the overhead induced by authorization steps,

including file token generation and share token generation, against the convergent encryption and

file upload steps. We evaluate the overhead by varying different factors, including 1) File Size 2)

Number of Stored Files 3) Deduplication Ratio 4) Privilege Set Size. We break down the upload

process into 6 steps, 1) Tagging 2) Token Generation 3) Duplicate Check 4) Share Token

Page 6: Sheltered Authorized Deduplication In Hybrid Cloud · PDF file... (secure) authorized deduplication. ... Sheltered Authorized Deduplication In Hybrid Cloud Design ... and in block

1408

ISSN: 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume - 3, Issue -4, December 2015 Continuous 28th Edition, Page No.:1403-1410

Adabala P V D L Kum ar, A S G R Srinivas, M. Janaki Ram udu :: Sheltered

Authorized Deduplication In Hybrid Cloud Design

Generation 5) Encryption 6) Transfer . For each step, we record the start and end time of it and

therefore obtain the breakdown of the total time spent. We present the average time taken in each

data set in the figures

5.1 File Size

To evaluate the effect of file size to the time spent on different steps, we upload 100 unique

files (i.e., without any deduplication opportunity) of particular file size and record the time break

down. Using the unique files enables us to evaluate the worst-case scenario where we have to

upload all file data. The average time of the steps from test sets of different file size are plotted in

Figure 2. The time spent on tagging, encryption, upload increases linearly with the file size, since

these operations involve the actual file data and incur file I/O with the whole file.

Figure 2: Time Breakdown for Different file size

5.2 Number of Stored Files To evaluate the effect of number of stored files in the system, we upload 10000 10MB

unique files to the system and record the breakdown for every file upload. From Figure 3, every step remains constant along the time. Token checking is done with a hash table and a linear search would be carried out in case of collision.

Figure 3: Time Breakdown for Different number of stored files

5.3 Deduplication Ratio To evaluate the effect of the deduplication ratio, we prepare two unique data sets, each of

which consists of 50 100MB files. We first upload the first set as an initial upload. For the second

upload, we pick a portion of 50 files, according to the given deduplication ratio, from the initial set

as duplicate files[8] and remaining files from the second set as unique files. The average time of

uploading the second set is presented in Figure 4.

Page 7: Sheltered Authorized Deduplication In Hybrid Cloud · PDF file... (secure) authorized deduplication. ... Sheltered Authorized Deduplication In Hybrid Cloud Design ... and in block

1409

ISSN: 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume - 3, Issue -4, December 2015 Continuous 28th Edition, Page No.:1403-1410

Adabala P V D L Kum ar, A S G R Srinivas, M. Janaki Ram udu :: Sheltered

Authorized Deduplication In Hybrid Cloud Design

Figure 4: Time Breakdown for Different Deduplication Ratio

5.4 Privilege Set Size To evaluate the effect of privilege set size, we upload 100 10MB unique files with different

size of the data owner and target share privilege set size. In Figure 5, it shows the time taken in

token generation increases linearly as more keys are associated with the file and also the duplicate

check time. While the number of keys increases 100 times from 1000 to 100000, the total time

spent only increases to 3.81 times and it is noted that the file size of the experiment is set at a small

level (10MB), the effect would become less significant in case of larger files.

Figure 5: Time Breakdown for Different Privilege Set size

6. Conclusion

Hybrid cloud architecture provides a lot of benefits with the use of both public and private

cloud. Nowadays most of the users use cloud to store data. Increasing amount of data in cloud is a

major concern. In order to reduce the space and to efficiently utilize, data deduplication is used. So,

In this paper, the idea of authorized data deduplication was proposed to protect the data securely by

including differential authority of users in the duplicate check. Security analysis demonstrates that

our schemes are secure in terms of insider and outsider attacks specified in the proposed security

model. We showed that our authorized duplicate check scheme incurs minimal overhead compared

to convergent encryption and network transfer.

7. References [1] A Hybrid Cloud Approach for Secure Authorized Deduplication, Jin Li, Yan Kit Li, Xiaofeng Chen,

Patrick P. C. Lee, Wenjing Lou, IEEE Transactions On Parallel And Distributed System Vol:Pp

No:99 YEAR 2014

Page 8: Sheltered Authorized Deduplication In Hybrid Cloud · PDF file... (secure) authorized deduplication. ... Sheltered Authorized Deduplication In Hybrid Cloud Design ... and in block

1410

ISSN: 2347-1697 International Journal of Informative & Futuristic Research (IJIFR)

Volume - 3, Issue -4, December 2015 Continuous 28th Edition, Page No.:1403-1410

Adabala P V D L Kum ar, A S G R Srinivas, M. Janaki Ram udu :: Sheltered

Authorized Deduplication In Hybrid Cloud Design

[2] A Hybrid Cloud Approach for Secure Authorized Deduplication Gaurav Kakariya, Prof. Sonali

Rangdale International Journal of Computer Engineering and Applications, Volume VIII, Issue I,

October 14

[3] Neal Leavitt ,2013 Hybrid Clouds Move to the Forefront.

[4] Hui Zhang, Guofei Jiang, Kenji Yoshihira, Haifeng Chen and AkhileshSaxena ,2009, A Hybrid

Cloud ComputinModel

[5] Borja Sotomayor, Rubén S. Montero and Ignacio M. Llorente, Ian Foster ,2009, Virtual

Infrastructure Management in Private and Hybrid Clouds.

[6] David Geer, 2008,Reducing the Storage Burden via Data Deduplication.computer.org.

[7] S. Bugiel, S. Nurnberger, A. Sadeghi, and T. Schneider. Twin clouds: An architecture for secure

cloud computing. In Workshop on Cryptography and Security in Clouds (WCSC 2011) , 2011.

[8] J. R. Douceur, A. Adya, W. J. Bolosky, D. Simon, and M. Theimer. Reclaiming space from

duplicate files in a serverless distributed file system. In ICDCS, pages 617–624, 2002.