private information retrieval. what is private information retrieval (pir) ? reduction from private...

Private Information Retrieval

• What is Private Information retrieval (PIR) ?

• Reduction from Private Information Retrieval (PIR) to Smooth Codes

• Constructions (Achieving the Barrier)

• Construction (Breaking the Barrier)

Contents

)( 12

1

knO

)( 12

1

knO

Private Information Retrieval (PIR)

• Query a public database, without revealing the queried record.

• Example: A broker needs to query NASDAQ database about a stock, but doesn’t want anyone to know he is interested.

PIR

• The Single Server Case• Chor et all have shown in their

1995 paper that for a single server the it is necessary to send the whole content of the database.

PIR

• A k server PIR scheme of one round, for database length n consists of:

}1,0{){0,1}({0,1}[n]:R :function tionReconstruc

}1,0{→{0,1}:A,...,A functionsanswer K

}1,0{{0,1}[n]:Q,...,Q functionsquery K

kll

lk1

lk1

aq

q

rnd

a

q

l

l

PIR – definition

• These functions should satisfy:

q].r)(j,[QPr q]r)(i,[Q Pr

,{0,1}q and [k]s , [n] ji,every For :Privacy

x r)),...)(i,Q(AR(i,...,

{0,1}],[,{0,1}x : sCorrectnes

srsr

q

ijj

rndn

rni

Simple Construction of PIR

• 2 servers, one round• Each server holds bits x1,…, xn.• To request bit i, choose uniformly A

subset of [n]• Send A to the first server. • Send the second server A+{i} (add i to

A if it is not there, remove if it is there)• Servers return the xor of the bits in the

indices of the requests.• Xor the answers.

Smoothly Decodable Code

C:{0,1}nm is a (q,c,) smoothly decodable code if there exists a prob. algorithm, A, such that:

x {0,1}n and i {1,..,n}, Pr[ A(C(x),i)=xi ] > ½ +

A reads at most q indices of y (of its choice)

The Probability is over the coin tosses of A

Queries are not allowed to be adaptive

A has access to a non corrupted codeword

i {1,..,n} and j {1,..,m}, Pr[ A(·,i) reads j ] ≤ c/m

LDC is Smooth

• Claim: Every (q,δ,ε) LDC is a (q,q/ δ, ε) smooth code.

• Intuition – If the code is resilient against linear number of errors, then no bit of the output can be queried too often (or else adversary will choose it)

Smooth Code is LDC

• A bit can be reconstructed using q uniformly distributed queries, with ε advantage , when no errors

• With probability (1-qδ) all the queries are to non-corrupted indices.

Remember: Adversary does not know decoding procedure’s random coins

Reduction from PIR to SDC [Gol,Ka,Sch,Tr 02]

• A codeword is a Concatenation of all possible answers from the servers

• A query procedure is made of k queries to the codeword corresponding to the answers of k servers on the requested bit (for queries generated as in the PIR)

• From the PIR properties it follows that the distribution of queries to the indices of the codeword are independent of the requested bit

Reduction from PIR to SDC

• Let a be the length of an answer from a server, k the number of servers and q the length of a query

• Let l= be the length of a codeword • Let Pj be the probability of querying bit

j. Note that • Set . And duplicate bit j Nj times.

When querying for bit j choose at random one of the Nj bits

kPj

j

ak q2

jj PN

Reduction from PIR to SDC

• The probability of accessing each bit is now less than 1/l

• The new length of the encoding is less than (k+1)l

• We have a (ka,k+1,1/2) LDC

• Ingredients:• X – the database string• E : • Px(Z1,…,Zm) – A polynomial in

m=(n^d) variables of degree d s.t. Px(E(i))=xi

• s.t.

Achieving the Barrier)( 12

1

knO

mn }1,0{][

mkYY }1,0{,...,1 )(

1iEY

k

j j

• The user generates the Yj and sends all Yq q!=j to server j

• We can view Px as a polynomial in the km variables Yjl where the Yjl sum to Zj

• Each server knows the value of (k-1)m variables

• Let d=k-1, hence each monomial of Px has at most k-1 different variables


1

knO

• Each variable is known to k-1 servers, hence there exists a server who knows the values of all the variables in the monomial.

• Assign each monomial to one of the servers who know all its variables.


1

knO

• Each server calculates the xor of the monomials assigned to it and sends to the user

• The user calculates the xor of all the answers.


1

knO

ix

jxjkxjk

XiEP

ZPYPYM

))((

)()()(

• Security - each server received k-1 vectors which are random independent strings of length m

• Communication Complexity – each server received k-1 vectors, each of length m=O(n^(1/d)) = O(n^(1/(k-1)) by choice of m and d.


1

knO

• Now take d=1/(2k-1) • Each monomial has a server who misses

at most 1 variable, assign the monomial to that server

• Each server sends the 1-bit coefficients of the polynomial which is the sum of all monomials assigned to it

• The user evaluates the polynomial on the variables Y


1

knO

• The query complexity is the same O(n^(1/d))

• The answer complexity is (k^2)m=O(n^1/d)

• Total complexity : O(n^1/d)=O(n^(2k-1)) by choice of d


1

knO

• The first idea that comes to mind is to try and increase the degree d even further.

• Unfortunately this does not work due to the increasing size of the polynomials the servers return.

• The novelty of the paper is how to go around this difficulty.

Breaking the Barrier)( 12

1

knO

• Assume that each polynomial is known not to one server but to a group of servers.

• Now we do not need to receive the polynomials themselves but can use the PIR scheme (on those servers) to evaluate them on the required input.


1

knO

• Suppose that we could write Px as a sum of Pv where v ranges over all subsets of the servers. The problem of evaluating Px reduces to evaluating each Pv which (we hope) is of lower degree.

• On the other hand, also the number of servers is smaller which is a disadvantage.

• The paper comes to find such Pv with good properties


1

knO

• Define k’ to be a lower bound on the size of the sets V and the maximum number of variables a server misses in Pv.

• All together V misses at most |V| variables in Pv.


1

knO

• We will choose an encoding E such that the hamming weight of E(i) (and therefore the number of monomials) will be bounded by d (the number of monomials is bounded by 2^d).

• If we had Pv as specified then we could apply the PIR recursively on all sets of size more than k’ with communication complexity:


1

knO

k

kl

dlP

dkP lnC

l

knOknC

'

//1' ),((),(

• Let E be an encoding to all strings of length m and weight d.

• We can encode different values thus is sufficient to encode n values.

• Define it holds that• Define V(M) to be all servers who

miss at most variables in M


1

knO

d

m

)( /1 dnm

n

i iElikx

l

ZxZZP1 1)(

1 ),...,( ix xiEP ))((

• Lemma: for ,k’<=k and d<=(+1)k-(-1)k’+(-2) and M a monomial of degree d in Yj,h then either there is a server who misses at most one variable or |V(M)|>=k’

• Proof: Counting argument


1

knO

• Claim: Let k,,k’ be as before then there are polynomials Pv,Pj for every V[k] s.t. |V|>=k’ and j[k] s.t.– Pv is of degree |V| and can be

computed from Px and {Yj}jV– Pj is of degree 1 and can be computed

from Px and {Yj}ji –


1

knO

][||],[

)()()(kj

jkVkVVx zPzPzP

• Proof: It is sufficient to prove for P consisting of a single monomial, then we can sum over all monomials.

• Denote • Define (M) to be the number of

variables in M for which


1

knO

)(

,)(

)(MVj

qjMVj

q

q

q

q

YZMT

qjqY , )(MVjq

• WLOG take • Define a

polynomial in mk variables.• Q has k^d monomials each of the

form


1

knO

),...,()(1

,1

1,][

][,

k

jmj

k

jjx

mh

kjhjx YYPYQ

dkjjx ZZZZP ...)( 21][

djjj dYYYM ...21 21

1. Set Q’=Q, for all V Pv=02. Find V=V(M) for some monomial M

in Q’ s.t. V is of maximal size, if |V|<k’ stop.

3. While there is M’ s.t. V(M’)=V:• Pick M’ from Q’ which maximizes

(M’)• Pv=Pv+T(M’), Q’=Q’-T(M’)

4. Goto 2


1

knO

• If the algorithm halts then the Pv are of the desired degree and their sum is equal to P-Q’ for Q’ at the end of the execution.

• Likewise, for each M in Q’ there exists a server j who misses at most one variable, add M to Pj


1

knO

• Define MM’ if V(M)=V(M’) and

for all q<=d either or • If M’ is a monomial in T(M) then

1. V(M’)V(M)2. (M’)<=(M)3. Equality in 1,2 implies MM’4. M1M2 implies either both are in T(M) of

both aren’t


1

knO

djjj dYYYM ...21 21

)(', MVjj qq 'qq jj

• Each time step 3 is applied we either add to Q’ monomials M’ with smaller V(M’) or (M’) which will be dealt with later.

• Or M’M so it already exists in Q’ and is removed.


1

knO

• Lemma: For all i>0 and k>(i-1)! there exists a PIR protocol Pi with communication complexity O

(n^2/ik)• Corollary : there exists a PIR

protocol with communication complexity


1

knO

)( )log/(loglog kkkcnO

• For every PIR scheme we have a related smooth code

• Upper bound for PIR is raised to

• Likewise the upper bound for smooth codes is raised to

Summary

)2()log/(loglog kkkcnO

)( )log/(loglog kkkcnO

• T-collusion PIR, the protocol must maintain security against collusions of T servers. General results appear in “Information-Theoretic Private Information Retrieval: A Unified Construction” [Beimel, Ishai]

• CPIR – Computational PIR in which the security definition is relaxed to a computational one.

• There exist polylog single server CPIR protocols [Cachin, Micali, Stadler]

Related Topics

private information retrieval. what is private information retrieval (pir) ? reduction from private...

Documents