recovering data in presence of malicious errors atri rudra university at buffalo, suny
Post on 01-Apr-2015
214 Views
Preview:
TRANSCRIPT
Recovering Data in Presence of Malicious Errors
Atri RudraUniversity at Buffalo, SUNY
2
The setupC(x)
x
y = C(x)+error
x Give up
Mapping C Error-correcting code or just code Encoding: x C(x) Decoding: y X C(x) is a codeword
3
Codes are useful!
CellphonesSatellite Broadcast Deep-space
communicationInternet
CDs/DVDs RAID ECC MemoryPaper Bar-codes
4
Redundancy vs. Error-correction Repetition code: Repeat every bit say 100
times Good error correcting properties Too much redundancy
Parity code: Add a parity bit Minimum amount of redundancy Bad error correcting properties
Two errors go completely undetected
Neither of these codes are satisfactory
1 1 1 0 0 1
1 0 0 0 0 1
5
Two main challenges in coding theory Problem with parity example
Messages mapped to codewords which do not differ in many places
Need to pick a lot of codewords that differ a lot from each other
Efficient decoding Naive algorithm: check received word with all
codewords
6
The fundamental tradeoff
Correct as many errors as possible with as little redundancy as possible
This talk: Answer is yes
Can one achieve the “optimal” tradeoff with efficient encoding and decoding ?
7
Overview of the talk Specify the setup
The model What is the optimal tradeoff ?
Previous work Construction of a “good” code High level idea of why it works Future Directions
Some recent progress
8
Error-correcting codesC(x)
x
y
x Give up
Mapping C : kn
Message length k, code length n n≥ k
Rate R = k/n 1
Efficient means polynomial in n Decoding Complexity
9
Shannon’s world
Noise is probabilistic Binary Symmetric Channel
Every bit is flipped
w/ probability p Benign noise model
For example, does not capture
bursty errorsClaude E. Shannon
10
Hamming’s world
Errors are worst case error locations arbitrary symbol changes
Limit on total number of errors Much more powerful than
Shannon Captures bursty errors
We will consider this channel
model
Richard W. Hamming
11
A “low level” view
Think of each symbol in being a packet The setup
Sender wants to send k packets After encoding sends n packets Some packets get corrupted Receiver needs to recover the original k packets
Packet size Ideally constant but can grow with n
12
Decoding
C(x) sent, y received x k, y n
How much of y must be correct to recover x ? At least k packets must be correct At most (n-k)/n = 1-R fraction of errors 1-R is the information-theoretic limit
: the fraction of errors decoder can handle Information theoretic limit implies 1-R
x C(x)
yR = k/n
13
Can we get to the limit or 1-R ? Not if we always want to uniquely recover the
original message Limit for unique decoding, (1-R)/2
(1-R)/2 (1-R)/2
1-R
c1
c2
y
R 1-R
(1-R)/2
14
List decoding [Elias57, Wozencraft58] Always insisting on unique codeword is
restrictive The “pathological” cases are rare
“Typical” received word can be decoded beyond (1-R)/2
Better Error-Recovery Model Output a list of answers List Decoding Example: Spell Checker
(1-R)/2
Almost all the space in higher dimension.
All but an exponential (in n) fraction
15
Advantages of List decoding
Typical received words have an unique closest codeword List decoding will return list size of one such
received words Still deal with worst case errors How to deal with list size
greater than one ? Declare an error; or Use some side information
Spell checker
(1-R)/2
16
The list decoding problem
Given a code and an error parameter For any received word y
Output all codewords c such that c and y disagree in at most fraction of places
Fundamental Question The best possible tradeoff between R and ?
With “small” lists Can it approach information-theoretic limit 1-R ?
17May 25, 2007 Ph.D. Final Exam 17
Other applications of list decoding Cryptography
Cryptanalysis of certain block-ciphers [Jakobsen98] Efficient traitor tracing scheme [Silverberg, Staddon, Walker 03]
Complexity Theory Hardcore predicates from one way functions [Goldreich,Levin 89;
Impagliazzo 97; Ta-Shama, Zuckerman 01] Worst-case vs. average-case hardness [Cai, Pavan, Sivakumar 99;
Goldreich, Ron, Sudan 99; Sudan, Trevisan, Vadhan 99; Impagliazzo, Jaiswal,
Kabanets 06] Other algorithmic applications
IP Traceback [Dean,Franklin,Stubblefield 01; Savage, Wetherall, Karlin,
Anderson 00] Guessing Secrets [Alon,Guruswami,Kaufman,Sudan 02; Chung, Graham,
Leighton 01]
18
Overview of the talk Specify the setup
The model The optimal tradeoff between rate and fraction of
errors Previous work Construction of a “good” code High level idea of why it works Future Directions
Some recent progress
19
Information theoretic limit
< 1 - R Information-
theoretic limit Can handle
twice as many errors
Rate (R)
Unique decoding
Inf. theoretic limit
Fra
c. o
f Err
ors
()
20
Achieving information theoretic limit There exist codes that achieve the
information theoretic limit ≥ 1-R-o(1) Random coding argument
Not a useful result Codes are not explicit No efficient list decoding algorithms
Need explicit construction of such codes We also need poly time (list) decodability
Requires list size to be polynomial
21
The challenge
Explicit construction of code(s) Efficient list decoding algorithms up to the
information theoretic limit For rate R, correct 1-R fraction of errors
Shannon’s work raised similar challenge Explicit codes achieving the information theoretic
limit for stochastic models The challenge has been met [Forney 66, Luby-
Mitzenmacher-Shokrollahi-Spielman 01, Richardson-Urbanke01] Now for stronger adversarial model
22
Guruswami-Sudan
The best until 1998
1 - R1/2
Reed-Solomon codes
Sudan 95, Guruswami-Sudan98
Better than unique decoding
At R=0.8 Unique: 10% Inf. Th. limit: 20% GS : 10.56 %
Unique decoding
Inf. theoretic limit
Fra
c. o
f Err
ors
()
Rate (R)
Motivating Question:
Close the gap between blue and
green line with explicit efficient codes
23
The best until 2005
1-(sR)s/(s+1)
s 1 Parvaresh,Vardy
s=2 in the plot
Based on Reed-Solomon codes
Improves GS for R < 1/16
Unique decoding
Inf. theoretic limit
Guruswami-Sudan
Parvaresh-Vardy
Fra
c. o
f Err
ors
()
Rate (R)
24
Our Result
1- R - > 0 Folded RS codes [Guruswami, R.
06]
Unique decoding
Inf. theoretic limit
Guruswami-Sudan
Parvaresh-Vardy
Fra
c. o
f Err
ors
()
Rate (R)
Our work
25
Overview of the talk Specify the setup
The model The optimal tradeoff between rate and fraction of
errors Previous work Our Construction High level idea of why it works Future Directions
Recent progress
26
The main result
Construction of algebraic family of codes For every rate R >0 and >0
List decoding algorithm that can correct 1 - R - fraction of errors
Based on Reed-Solomon codes
27
Algebra terminology
F will denote a finite field Think of it as integers mod some prime
Polynomials Coefficients come from F Poly of degree 3 over Z7
f(X) = X3 +4X +5 Evaluate polynomials at points in F
f(2) = (8 + 8 + 5) mod 7 = 21 mod 7 =0 Irreducible polynomials
No non-trivial polynomial factors X2+1 is irreducible over Z7 , while X2-1 is not
28
Reed-Solomon codes
Message: (m0,m1,…,mk-1) Fk
View as poly. f(X) = m0+m1X+…+mk-1Xk-1
Encoding, RS(f) = ( f(1),f(2),…,f(n) ) F ={ 1,2,…,n}
[Guruswami-Sudan] Can correct up to
1-(k/n)1/2 errors in polynomial timef(1) f(2) f(3) f(4) f(n)
29
Parvaresh Vardy codes (of order 2)
f(1) f(2) f(3) f(4) f(n)
g(1) g(2) g(3) g(4) g(n)
f(X) g(X)g(X)=f(X)q mod E(X)
Extra information from g(X) helps in decoding Rate, RPV = k/2n [PV05] PV codes can correct 1 -(k/n)2/3 errors
in polynomial time 1 - (2RPV)2/3
30
Towards our solution
Suppose g(X) = f(X)q mod E(X) = f(X) Let us look again at the PV codeword
f(1) f(1)
g(1) g(1)f(1) f(1)
31
Folded Reed Solomon Codes Suppose g(X) = f(X)q mod E(X) = f(X) Don’t send the redundant symbols Reduces the length to n/2
R = (k/2)/(n/2) = k/n Using PV result, fraction of errors
1 - (k/n)2/3 = 1 - R2/3
f(1) f(1)
f(1) f(1)
32
Getting to 1-R-
Started with PV code with s = 2 to get 1 - R2/3
Start with PV code with general s 1 - Rs/(s+1)
Pick s to be “large” enough to approach 1-R- Decoding complexity increases from that of
Parvaresh-Vardy but still polynomial
33
What we actually do We show that for any generator F\{ 0 }
g(X) = f(X)q mod E(X) = f(X) Can achieve similar compression by grouping
elements in orbits of m’~n/m, R ~ (k/m)/(n/m) = k/n
f(1) f(m) f((m’-1)m )
f(m-1) f(2m-1) f(mm’-1)
f() f(m+1) f((m’-1)m+1 )
34
Proving f(X)q mod E(X) = f(X) First use the fact f(X)q = f(Xq) over F
Need to show f(Xq) mod E(X) = f(X) Proving Xq mod E(X) = X suffices Or, E(X) divides Xq-1 - E(X) = Xq-1 – is irreducible
35
Our Result
· 1- R - > 0 Folded RS codes [Guruswami, R.
06]
Unique decoding
Inf. theoretic limit
Guruswami-Sudan
Parvaresh-Vardy
Fra
c. o
f Err
ors
()
Rate (R)
Our work
36
“Welcome” to the dark side…
37
Limitations of our work
To get to 1 - R - , need s > 1/ Alphabet size = ns > n1/
Fortunately can be reduced to 2poly(1/)
Concatenation + Expanders [Guruswami-Indyk’02] Lower bound is 21/
List size (running time) > n1/
Open question to bring this down
38
Time to wake up
39
Overview of the talk List Decoding primer Previous work on list decoding Codes over large alphabets
Construction of a “good” code High level idea of why it works
Codes over small alphabets The current best codes
Future Directions Some (very) modest recent progress
40
Optimal Tradeoff for List Decoding Best possible is H-1 (1-R)
H()= - log - (1- )log(1- ) Exists (H-1(1-R-),O(1/ )) list decodable code
Random code of rate R has the property whp > H-1(1-R+) implies super poly list size
For any code
For large q, H-1 (1-R) 1-R
q
q
q
q
41
Our Results (q=2)
Optimal tradeoff H-1(1-R)
[Guruswami, R. 06] “Zyablov”
bound [Guruswami, R.
07] Blokh-Zyablov
# E
rro
rs
Rate
Zyablov bound
Blokh-Zyablov bound
Previous best
Optimal Tradeoff
42
How do we get binary codes ? Concatenation of codes [Forney 66]
C1: (GF(2k))K (GF(2k))N (“Outer” code)
C2: GF(2)k (GF(2))n (“Inner” code)
C1± C2: (GF(2))kK (GF(2))nN
Typically k=O(log N) Brute force decoding for inner code
m1 m2
wNw1 w2
mKm
C1(m)
C2(w1) C2(w2)C2(wN) C1± C2(m)
43
List Decoding concatenated code C1 = folded RS code
C2 = “suitably chosen” binary code Natural decoding algorithm
Divide up the received word into blocks of length n
Find closest C2 codeword for each block
Run list decoding algorithm for C1 Loses Information!
44
List Decoding C2
y1 y2 yN
How do we “list decode” from lists ?
2 GF(2)n
S1 S2 SN
2 GF(2)k
45
The list recovery problem
Given a code and an error parameter For any set of lists S1,…,SN such that
|Si| s, for every i
Output all codewords c such that ci 2 Si for at least 1-fraction of i’s
List decoding is special case with s=1
46
List Decoding C1± C2
y1 y2 yN
S1 S2 SN
List decode C 2
List Recovering Algorithm for C1
47
Putting it together [Guruswami, R. 06] C1 can be list recovered from 1 and C2 can be
list decoded from 2 errors C1± C2 list decoded from 12 errors
Folded RS of rate R list recoverable from 1-R errors
Exists inner codes of rate r list decoded from H-1 (1-r) errors Can find one by “exhaustive” search
C1± C2 list decodable fr’m (1-R)H-1(1-r) errors
48
Multilevel Concatenated Codes C1: (GF(2k))K (GF(2k))N (“Outer” code 1)
C2: (GF(2k))L (GF(2k))N (“Outer” code 2)
Cin: GF(2)2k (GF(2))n (“Inner” code)
m1 m2 mK m
vNv1 v2 C1(m)
M1 M2 ML M
wNw1 w2 C2(M)
Cin(v1,w1) Cin(v2,w2) Cin(vN,wN)
C1 and C2 are FRS
49
Advantage over rate rR Concat Codes C1, C2 ,Cin
have rates R1, R2 and r Final rate r(R1+R2)/2, choose R1< R
Step 1: Just recover m List decode Cin up to H-1 (1-r) errors
List recover C1 up to 1-R1 errors m1 m2 mK m
vNv1 v2 C1(m)
M1 M2 ML M
wNw1 w2 C2(M)
Cin(v1,w1) Cin(v2,w2) Cin(vN,wN)
Can handle (1-R1)H-1(1-r) >(1-R)H-1(1-r)
errors
50
Advantage over Concatenated Codes Step 2: Just recover M, given m
Subcode of Cin of rate r/2 acts on M List decode subcode upto H-1(1-r/2) errors List recover C2 upto 1-R2 errors
Can handle (1-R2) H-1(1-r/2) errorsm1 m2 mK m
vNv1 v2 C1(m)
M1 M2 ML M
wNw1 w2 C2(M)
Cin(v1,w1) Cin(v2,w2) Cin(vN,wN)
51
Wraping it up
Total errors that can be handled min{(1-R1)H-1(1-r) , (1-R2) H-1(1-r/2) }
Better than (1-R)H-1 (1-r) (R1+R2)/2=R (recall that R1<R) H-1(1-r/2) > H-1(1-r) so choose R2 a bit > R
Optimize over choices of r, R1 and R2
Need nested list decodability of inner code Blokh Zyablov follows from multiple outer
codes
52
Our Results (q=2)
Optimal tradeoff H-1(1-R)
[Guruswami, R. 06] “Zyablov”
bound [Guruswami, R.
07] Blokh-Zyablov
# E
rro
rs
Rate
Zyablov bound
Blokh-Zyablov bound
Previous best
Optimal Tradeoff
53
How far can concatenated codes go? Outer code: folded RS Random and independent inner codes
Different inner codes for each outer symbol Can get to the information theoretic limit
= H-1(1-R) [Guruswami, R. 08]
54
To summarize
List decoding: A central coding theory notion Permits decoding up to the optimal fraction of
adversarial errors Bridges adversarial and probabilistic approaches
to information theory Shannon’s information theoretic limit p = H-1 (1-R) List decoding information theoretic limit = H-1(1-R)
Efficient list decoding possible for algebraic codes
55
Our Contributions
Folded RS codes are explicit codes that achieve information theoretic limit for list decoding
Better list decoding for binary codes Concatenated codes can get us to list
decoding capacity
56
Open Questions
Reduce decoding complexity of our algorithm List decoding for binary codes
Explicitly achieve error bound = H-1(1-R) Erasures: decode when = 1-R
Non-algebraic codes ? Graph based codes ? Other applications of these new codes
Extractors [Guruswami, Umans, Vadhan 07] Approximating NP-witnesses [Guruswami, R. 08]
57
Thank You
Questions ?
top related