[dpm 2015] perfectdedup - secure data deduplication for cloud storage
TRANSCRIPT
PerfectDedupSecure Data Deduplication
Pasquale [email protected]
SecludIT & EURECOM
Refik Molva (EURECOM)Melek Önen (EURECOM)Sergio Loureiro (SecludIT)
10th DPM International Workshop on Data Privacy ManagementVienna, Austria, September 21st 2015
Agenda
• Problem Statement– Data Deduplication for Cloud Storage
– Convergent Encryption
• Our solution– Data Popularity
– Perfect Hashing
– PerfectDedup: Secure Popularity Detection
– Security
– Performance Evaluation
2
Deduplication
• Storing duplicate data only once
• Cross-user + Client-side + Block-level
3
Deduplication vs Encryption
… but it does not work on encrypted data!
D = Hello
World
D = Hello
World
ENCRYPTION with K1 ENCRYPTION with K2
owhfgr0wgr[w
hfrw0[h0[ergh
e0[gh0[eg
dfjl;dbfrwbfirbf
roepthwobgfr
ugtwertgrtwu
4
Convergent Encryption
• Data Encryption key derived from DataK = hash(Data)
• Deterministic & Symmetric Encryption
D = Hello
World
D = Hello
World
ENCRYPTION with H(D) ENCRYPTION with H(D)
klfgwilegfiorw
egtriegtiergiei
ergriegrigfifiw
klfgwilegfiorw
egtriegtiergiei
ergriegrigfifiw
5
Douceur, John R., et al. "Reclaiming space from duplicate files in a serverless distributed file system." Distributed Computing Systems, 2002. Proceedings. 22nd International Conference on. IEEE, 2002.
Convergent Encryption
MISSINGINFORMATION
How to achieve safe Convergent Encryption
in the Cloud ?6
Drew Perttula, Brian Warner, and Zooko Wilcox-O'Hearn, 2008-03-20https://tahoe-lafs.org/hacktahoelafs/drew_perttula.html
Data Popularity
• Different protection based on data-segment popularity
• Popular data Not confidential To bededuplicated Convergent Encryption
• Unpopular data Confidential To beprotected Semantically-Secure Encryption
7
Stanek, Jan, et al. "A secure data deduplication scheme for cloud storage." Financial Cryptography and Data Security. Springer Berlin Heidelberg, 2014. 99-118.
How to securely detect popularity ?
CSP
.
.
.
B...
Is block B popular ?
YES / NO
• Block B must not be disclosed if it is unpopular (sensitive)
CLIENT
8
PHF-based Lookup
9
ID
Belazzougui, Djamal, Fabiano C. Botelho, and Martin Dietzfelbinger. "Hash, displace, and compress." Algorithms-ESA 2009. Springer Berlin Heidelberg, 2009. 682-693.
PerfectDedup
• Based on «Secure» Perfect Hashing– One-wayness
• Popular block IDs Collision-free hash function (PHF)
• BENEFITS:– Efficient (linear) generation of a new PHF
(outsourced to the Cloud)
– Compact representation of PHF
– Very efficient (constant) evaluation on a block ID
10
Security
UNPOPULARP
POPULARP
CSP
.
.
.
.
.
.
PHF(ID) = ii ID
Block is popular1-to-1 mapping
No confidentiality issue
11
Security
UNPOPULARP
POPULARP
CSP
.
.
.
.
.
.
PHF(ID) = i
i ID’
Block is unpopularCollisions are well-distributed
One-wayness property
12
PerfectDedup
CSP
.
.
.
B...
Is block B popular ?
YES / NO
INDEX SERVICE
If NO
POPULARITYTRANSITION ? YES / NO
CLIENT
13
Prototype Implementation
CSP
INDEX SERVICE
CMPH
CMPH
CLIENT
14
Performance Evaluation
0
1
2
3
4
5
6
7
8
9
10
UNPOPULAR FILE POPULARITY TRANSITION POPULAR FILE
Tim
e (
in s
eco
nd
s)
Scenario
Client File Split Client Convergent Encryption
Client Popularity Check Client Symmetric Encryption
Idx Service Update Cloud Generate PHF
Cloud Store Hash Table Cloud Popularity Check
Cloud Upload Processing
15
Conclusions
• Popularity-based Deduplication
• Secure Perfect Hashing
• Secure & Lightweight for the client
• Costly tasks outsourced to the Cloud
• Low overhead
16
Future Work
• Optimization of PHF generation
• Deployment in real production environments
17