proofs of retrievability via hardness amplification yevgeniy dodis, salil vadhan and daniel wichs

PROOFS OF RETRIEVABILITY VIA

HARDNESS AMPLIFICATION

Yevgeniy Dodis, Salil Vadhan and Daniel Wichs

Remote Data Storage

Average Computer User: Bob

Remote Storage Server:

Lots of data (music, photos, e-mails, forms…)

Lots of devices (desktop, laptop, music player, phone, camera…)

Accessibility: Wants ability to access all data at all time from all devices.

Reliability: Should never loose data.

Provides greater accessibility and reliability.

(for a cheap price)

Does all of my data

still exist?

Is my data private?

Is it authentic?

Bob:


Remote Data Storage

Encrypt and MAC data

before storing it remotely

Proofs of Retrievability (PoR)

Introduced by [Juels, Kaliski 07]. An audit protocol between Bob and the server in which Bob checks that his data still retrievable.

Formalized using the extraction paradigm (as in proofs of knowledge).

Naïve Protocol: To run an audit, Bob downloads all his data

and verifies signature. Too costly! Bob does not actually need the data at the time

of an audit.

Goal: An audit protocol that has: Low communication complexity. Locality (server only accesses few locations of the data).

Direct-Product Scheme (One Audit)

Bob:

Bob’s file F Server file S

Error CorrectingCode


Store t random blocks S[r1],…,S[rt].

r1

r2

rt

Enrollment


Server file S


r1

r2

r3

Bob:

e = r1,…,rt

S[r1],…,S[rt]

Verify thatreceived blocksare correct.

Store t random blocks S[r1],…,S[rt].

Audit


Intuition for security: If the server knows enough blocks of

the server file S, then can decode F. If the sever knows too few blocks of

S, then it cannot pass an audit.

Unfortunately, intuition does not translate into a proof since the server does not gives us blocks of S.

Question 1: Is this scheme secure in general?

Question 2: Is the tradeoff between server storage overhead, communication, and locality optimal?

Know

Server file S

KnowDon’t know

KnowKnow

Don’t knowKnowKnowKnow

Don’t know


Arbitrary Adversarial Server:

Intuition for security: If the server knows enough blocks of

the server file S, then can decode F. If the sever knows too few blocks of S,

then it cannot pass an audit.

Unfortunately, intuition does not translate into a proof since the server does not gives us blocks of S.

Question 1: Is this scheme secure in general? How do we extract the file?

Question 2: Is the tradeoff between server storage, communication, and locality optimal?

e= (r1,…,rt)C*(e)

Answers ² fraction ofchallenges correctly with C*(e)= (S[r1],…,S[rt])

Prior Work

The “direct-product” scheme was introduced by [Naor, Rothblum 05] in the context of sublinear authenticators. PoR schemes were studied by [Juels, Kaliski 07], [Ateniese et al. 07], [Shacham, Waters 08].

Question 1: Is the direct-product scheme secure? Yes if… [JK07]: Make simplifying assumptions on behavior of the adversary. [JK07,SW08]: Add MACs to authenticate the responses.

Good: gives us “many-time” scheme + proof of security.Bad: increased server storage overhead (and computation/communication).

Question 2: Is the tradeoff between server storage overhead, communication, and locality optimal?An optimization to direct-product scheme appears as part of an optimized

MAC/Sig based scheme of [SW08].Nearly optimal parameters required Random Oracles.

Direct-Product Protocol (One Audit)

Server file S


Bob:

e = r1,…,rt

C(e) = S[r1],…,S[rt]

Verify thatreceived blocksare correct.

Store t random blocks S[r1],…,S[rt].Store key k for a MAC.

Tags

S[r] ¾[r] = mack(S[r])

¾[r1],…,¾[rt]

Prior Work

The “direct-product” scheme was introduced by [Naor, Rothblum 05] in the context of sublinear authenticators. PoR schemes were studied by [Juels, Kaliski 07], [Ateniese et al. 07], [Shacham, Waters 08].

Question 1: Is the direct-product scheme secure? Yes if… [JK07]: Make simplifying assumptions on behavior of the adversary. [JK07,SW08]: Add MACs to authenticate the responses.

Good: gives us “many-time” scheme + proof of security. Bad: increased server storage overhead (and computation/communication).

Question 2: Is the tradeoff between server storage overhead, communication, and locality optimal? No, e.g. Optimizations to communication complexity appear in [SW08] but

utilized Random Oracles to get nearly optimal parameters. Remove R.O. ? Further improvements?

Our Results Introduce new primitive called PoR codes.

Abstract key component of PoR into a clean coding-theoretic problem. Three ways to turn PoR codes into PoR schemes with various tradeoffs.

1. Security of PoR , efficient (list) decoding algorithms for such codes. 2. Efficiency of PoR , optimizing various parameters of PoR codes.

Construct nearly optimal PoR codes (and therefore PoR schemes). Along the way, answer questions 1, 2. Answer 1: The direct-product scheme is secure.

First storage efficient PoR scheme (optimization of [JK07]) with full proof of security.

First information-theoretically secure PoR. Answer 2: Further optimize all previous schemes.

In particular, remove Random Oracles from [SW08]. Key Step: Connect (list) decoding of PoR codes to seemingly

unrelated area of hardness amplification.

Our abstraction: PoR Codes

Bob’s file F Server file S 2 ¦n

PoR Codeword C 2 §N

… Coordinate C[e] corresponds to server’s response on challenge e.

In particular C can be exponential as it is never stored explicitly. Locality: C[e] can be computed from only a few positions in S. Ignores how Bob decides whether responses are correct/incorrect.

eStorage Server:Bob:

Direct Product PoR

ECC All t-tuples

e

C[e]

SF

Decoding PoR Codes (Attempt)


Given oracle access to C* that is ²-close to C, decode F. But we cannot uniquely

decode when ² · ½.

…

Incorrect codeword C*

C*(e)e

Decoder

Decoding PoR Codes: Two variants


Error List Decoding: Given oracle access to C* that is ²-close to C, produce a (short) list containing F Corresponds to “basic” scheme.

Erasure Decoding: given oracle access to C* that is ²-close to C and C*[e] 2 {C[e] , ? }, recover F Corresponds to MAC based

scheme.

Efficiency: Run-time poly(|F|, 1/²).

…

Incorrect codeword C*

C*(e)e

Decoder

PoR Schemes from PoR codes

Sheme 1: Bob stores (challenge, response) pairs locally. Good: Information Theoretic security. Optimal server storage. Bad: Bounded Use. Large client storage.

Scheme 2: Offload storage to the server (encrypt/MAC). Good: Optimal client storage. Small additive overhead to server

storage. Bad: Bounded use.

Scheme 3: Authenticate each block of server file. Good: Unbounded use. Optimal client storage. Bad: Server storage roughly doubles.

Basic ideas of Schemes 1,2,3 come from [NR05], [JK07],[SW08]. Efficiency of all schemes improved with optimized PoR codes. Security of schemes 1& 2 requires error list-decoding which has

not been known before (optimized or not).

List decoding “direct-product” codes

Bob’s file F Server file S…

ECC All t-tuples

Given oracle access to C* which is ²-close to C, output a small list containing F.

Hardness Amplification(direct-product theorems)

If S(r) is ±-hard then the direct-product

function

C(e) = (S(r1),…,S(rt)) e= (r1,…,rt)

is ²-hard, where ² ¿ ±.

PoR Codeword C


Hardness Amplification(direct-product theorems)

9 adversary computing

C(e) = (S(r1),…,S(rt)) on an ²-fraction of

tuples

)9 adversary that

computes S(r) on a ±-fraction of inputs.

Bob’s file F Server file S…

ECC


PoR Codeword C

All t-tuples


…

ECC

Hardness Amplification(uniform direct product theorems)

[Trev05], [IJK06], [IJKW08]



Given oracle access to an adversary that

computes C(e) = (S(r1),…,S(rt))on an ²-fraction of

tuples,construct a short list of adversaries one of which computes S(r) on a ±-fraction

of inputs.

PoR Codeword C

All t-tuples


…

ECC


Step 1: C* ) short list containing S* which is ±-close to S.

Step 2: Short list containing S* ) short list containing F.

Hardness Amplification(uniform direct product theorems)

[Trev05], [IJK06], [IJKW08]

Given oracle access to an adversary that

computes C(e) = (S(r1),…,S(rt))on an ²-fraction of

tuples,construct a short list of adversaries one of which computes S(r) on a ±-fraction

of inputs.

PoR Codeword C

All t-tuples

Parameters of Direct-Product Codes

Tradeoff between locality and server storage is optimal. Easy to show that challenge/response size must be O(¸). Does the challenge/response size need to depend on t?

Parameters Security param ¸.

Server Storage = °|F|. Any ° ¸ 1. Locality t= O(¸/(° -1)) Chall. Size = t log(n) Resp. Size = t log(|¦|)

…

ECC

Bob’s file F Server file S 2 ¦n

PoR codewordC 2 (¦ t)N

e= (r1,…,rt)

All t-tuples

U = S[r1],…,S[rt]

Two optimizations

Shorter Responses: Instead of sending response U = (S[r1],…, S[rt]), ask server to send a random position in an error-correcting encoding of U. [SW08]: Implicitly use Hadamard which

increases challenge. Can be replaced by Reed-Solomon.

Making this optimization work with MAC based scheme was major contribution of [SW08].

Shorter Challenges: Use a randomness efficient “hitter” to sample indices (r1,…, rt) with a shorter challenge. Works for erasure decoding.

Removes Random Oracles from [SW08].

Open for efficient error decoding. (works for inefficient decoding)

Storage Server:

Bob:

S

e

U = S[r1],…,S[rt]

ECC(U)[p]

=(r1,…,rt),p

e

Conclusions

Introduce PoR codes. Give nearly optimal constructions. Proves security of storage-efficient PoR schemes. First information-theoretic scheme. Remove the use of Random Oracles from [SW08].

Open questions: Can we show efficient list-decoding for optimized PoR

codes with a hitter? Do unbounded use schemes require poor server

storage overhead?

proofs of retrievability via hardness amplification yevgeniy dodis, salil vadhan and daniel wichs

Documents