on compression of data encrypted with block ciphers

On Compression of Data Encrypted with Block CiphersDemijan Klinc* Carmit Hazay† Ashish

Jagmohan**

Hugo Krawczyk** Tal Rabin**

* Georgia Institute of Technology** IBM T.J. Watson Research Labs

† Weizmann Institute and IDC

http://www.cs.technion.ac.il/~yanivca/cryptoday/10/abstracts.htm#l2



Traditional ModelTransmitting redundant data over

insecure and bandwidth-constrained channel• Traditionally, data first compressed

and then encrypted key (k)

Xsource

compress

encrypt

encoder

C(X) EK(C(X))

Traditional ModelWhat if encryptor and compressor

are two entities with different goals?• E.g., storage provider wants to compress

data to minimize storage space but does not have access to the key

Can we reverse the order of these steps?

Compression and Encryption in Reverse Order

key (k)

Xsource

encrypt compress

Ek(X) C(Ek(X))

Does not

know k!

Can we encrypt first and only then compress without knowing the key?

Compression and Encryption in Reverse OrderFor a fixed key, encryption scheme

is a bijection, therefore the entropy is preserved• It follows that it is theoretically

possible to compress the source to the same level as before encryption

In practice, encrypted data appears to be random • Conventional compression techniques

do not yield desirable results

Compression and Encryption in Reverse OrderFully homomorphic encryption

shows that one can compress optimally without decrypting• Simply run the compression

algorithm on the plaintext

Fully homomorphic encryption supports addition and multiplication:E(m1), E(m2) → E(m1+m2)E(m1), E(m2) → E(m1∙m2)Stating differently:C, E(m) → E(C(m))

OutlinePreliminariesSource Coding with Side

InformationCompressing Stream CiphersCompressing Block CiphersSimulation resultsImpossibility Result

Private Key EncryptionTriple of algorithms: (Gen,Enc,Dec)• Same key for encryption and

decryption

Security – CPA security (informally):• It should be infeasible to distinguish

an encryption of m from an encryption of m’

Private Key EncryptionTwo categories:• Stream ciphers

Plaintext encrypted one symbol at a time, typically by summing it with a key (XOR operation for binary alphabets), e.g., one-time pad

• Block ciphers Encryption is accomplished by means of

nonlinear mappings on input blocks of fixed length E.g., AES, DES

Binary Symmetric Channel

0

1

p

p

1-p

1-p

0

1

Communication model where each sent bit is flipped with probability p

Entropy is: H(p)= - (p log p +(1-p) log

(1-p))

X Y

Pr( Y = 0 | X = 0 ) = 1−pPr( Y = 0 | X = 1) = pPr( Y = 1 | X = 0 ) = pPr( Y = 1 | X = 1 ) = 1−p

Source Coding with Side Information

Xsource

compress

decompressC(X) X

Y

X,Y : random variables over a finite alphabet with a joint probability distribution PXYGoal: losslessly compress X with Y known only to the decoder

Source Coding with Side InformationFor sufficiently large block length,

this can be done at rates arbitrarily close to H[X|Y] [SlepianWolf73]• Non constructive theorem• Practical coding schemes use

constructions based on good linear error-correcting codes e.g. LDPC code [RichardsonUrbanke08]

Linear Error Correcting CodesError correcting codes:• Communication is over a noisy channel• Add redundancy to source to correct

errors

A linear code of length n and dimension r is a linear subspace of the vector space (F2)m

• Encoding: using generating matrix• Decoding: using parity check matrix

Linear Error Correcting CodesMinimum distance:• The weight of the lowest-weight

nonzero codeword

In order to correct i errors the minimum distance should be 2i+1

Linear Error Correcting CodesCosets:

Suppose that C is [m, r] linear code over F2 and that a is any vector in (F2)m

• Then the set a+C = {a+x | xC} is called a coset of C• Every vector of (F2)m is in some coset of C • Every coset contains exactly 2r vectors• Two cosets are either disjoint

or equal

Source Coding with Side InformationExample:

Assume Y known to encoder and decoder Ham(X,Y)≤1

Xsource

compress

decompressC(X) X

Y

Source Coding with Side InformationLet X=010, then Y{010, 011,

000, 110}Goal:

encode XY using less than 3 bits

How?Let e= XY, then e{000, 001, 010, 100} encoder sends index of coset in which e occurs

Source Coding with Side InformationLet C={Y,Y} be a linear code

with distance 3 that can fix one error

The space is partitioned into 4 cosets:

• Coset 1 = {000,111}• Coset 2 = {001, 110}• Coset 3 = {010, 101}• Coset 4 = {100, 011}

Recall:e{000, 001, 010,

100}

Each index requires 2 bits

decoding: output Ye’where e’ is the leader

000001010100

Source Coding with Side Information

Xsource

compress

decompressC(X) X

Y

Without Y the encoder cannot compute e!• e= XY

Source Coding with Side InformationStill possible: • Encode coset in which X occurs

• Coset 1 = {000,111}• Coset 2 = {001, 110}• Coset 3 = {010, 101}• Coset 4 = {100, 011}

Each index requires 2 bits

decoding: output e’where the hamming

distance of e’ and Y is smallest

Slepian-Wolf codes over finite block

lengths have nonzero error which implies

that the decoder will sometimes fail

Source Coding with Side InformationIn practice:1. Fix p and determine the

compression rate of a Slepian-Wolf code that satisfies the target error

2. Pick Slepian-Wolf code and determine the maximum p for which target error is satisfiedNeed to know the source statistics!

Compression Stream CiphersThis problem can be formulated as a

Slepian-Wolf coding problem [JohnsonWagnerRamchandran04]

key (k)

Xsource

compress

Ek(X) C(Ek(X))

The ciphertext is cast as a

source

The shared key k is cast as the decoder-only

side-information

Compression Stream Ciphers• Compression is achievable due to

correlation between the key K and the ciphertext C=XK

• The joint distribution of the source and side-information can be determined from the statistics of the source

Xsource

compress

Ek(X) C(Ek(X))

key (k)

Compression Stream Ciphers

key (k)

C(Ek(X))source

Joint decryptionand

decompression

decoder

X

The decoder knows k and source statisticsCompression rate H(Ek(X)|K)=H(XK|K)=H(X) is asymptotically achievable

EfficiencyEncoding: finding coset of Ek(X) can

be done by multiplying Ek(X) with parity check matrix• I.e., Ek(X)∙HT is the syndrome of Ek(X)

Decoding: exhaustive search through the coset of Ek(X)• Is improved using LDPC codes, decoding

is polynomial in the block length

SecurityCompression that operates on

top of one time pad does not compromise security of the encryption scheme• Compressor does not know K

Compressing Block CiphersWidely used in practiceThe correlation between the key

ciphertext is more complex• Previous approach is not directly

applicable

Does data encrypted with block ciphers can be compressed without access to the key?

Electronic Code Book (ECB) Mode The simplest mode of operation where each block is

evaluated separately Compression in this mode is theoretically possible, is

it also practical?

block cipher

X1

k

Ek(X1)

block cipher

X2

k

Ek(X2)

block cipher

Xn

k

Ek(Xn)

…

The compression schemes that we present rely on the

specifics of chaining operations

Cipher Block Chaining (CBC) Mode

block cipher

k

Ek(X1)

block cipher

k

Ek(X2)

block cipher

k

Ek(Xn)IV

IV

Xn

Xn

X2

X2

X1

X1

Correlation between Ek(Xi) and Xi+1 is easier to characterize and can be exploit for compression

…

Compressing Block Ciphers

IV, Ek(X1)…Ek(Xn) compressor

Last block is left uncompressed, while IV

is compressed

C(IV,) C(Ek(X1))…Ek(Xn)

Recalling that Xi+1= Ek(Xi)Xi+1Ek(Xi) is cast as the source and Xi+1 is cast as the side information

Decoding

decryptionK

Ek(Xn)

Xn

Xn

Slepian-Wolf

decoder

C(Ek(Xn-1))

Ek(Xn-1)

decryptionK

Xn-1

Slepian-Wolf

decoder

C(Ek(Xn))

Ek(Xn)

Ek(Xn-1)

Xn-1

Compression Factorlet {Cm,R,Dm,R} denote an order m

Slepian-Wolf code with compression rate R• Compressor Cm,R: {0,1}m → {0,1}mR

• Decompressor Dm,R: {0,1}mR x {0,1}m

→ {0,1}m

compression factor:R1

m+R•m•nm•)1+n(

≈

Compression ResultsIrregular LDPC codes were used in our

performance evaluation

Table: Attainable compression rates for m = 128 bits

Source Entropy

Compression Rate

Target Error

P

0.1739 0.50 10-3 0.0260.1301 0.50 10-4 0.0180.3584 0.75 10-3 0.0680.3032 0.75 10-4 0.054

Compression ResultsIrregular LDPC codes were used in our

performance evaluation

Table: Attainable compression rates for m = 1024 bits

Source Entropy

Compression Rate

Target Error

P

0.3195 0.50 10-3 0.0580.2778 0.50 10-4 0.0480.5710 0.75 10-3 0.1340.5464 0.75 10-4 0.126

Recall -- ECB Mode

block cipher

m1

K

Ek(m1)

block cipher

m2

K

Ek(m2)

block cipher

mn

K

Ek(mn)

…

Notable ObservationsExhaustive strategies are infeasible in

most cases• Except for very low-entropy plaintext

distributions or compression ratios• By truncating the ciphertext

For example, consider plaintext distribution consisting of 1,000 128-bit values uniformly distributed• One can compress the output of a 128-bit

block cipher by truncating the 128-bit ciphertext to 40 bits

Can we construct a better strategy?

Impossibility ResultThere does not exist generic

(C,D) for block ciphers unless (C,D)• Either exhaustive or• Computationally infeasible

There does not exist efficient (C,D) for ECB

mode!

The Public-Key SettingHybrid encryption• Using public-key scheme to encrypt

a symmetric key and then encrypt the data with this key

El Gamal encryption• Similar technique when using xor

Concluding RemarksData encrypted with block ciphers

are practically compressible, when chaining modes are employed

Notable compression factors were demonstrated with binary memoryless sources

Short block sizes limit the performance, but that could change in the future

Generic compression is impossible

Future WorkAn interesting question refers to

whether compression is possible without any preliminary knowledge on the data• Can compression be achieved using

algorithms that do not rely on the source statistics, i.e., universal algorithms

The error:• Can we consider less limited setting

where the error is not independent?

Thank You!

on compression of data encrypted with block ciphers

Documents