erasure correcting codes in the real world udi wieder incorporates presentations made by michael...

46
Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher.

Post on 20-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Erasure Correcting Codes

In The Real World

Udi Wieder

Incorporates presentations made by Michael Luby and Michael Mitzenmacher.

Page 2: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Based On.. Practical Loss-Resilient Codes

Michael Luby, Amin Shokrollahi, Dan Spielman, Bolker Stemann STOC ’97

Analysis of Random Processes Using And-Or Tree Evolution Michael Luby, Amin Shokrollahi

SODA ’98

LT Codes Michael Luby

STOC 2002

Online Codes Petar Maymounkov

Page 3: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Probabilistic Channels1-p

1-p

p

p

0

1

0

1

?

1-p

1-p

p

p

0

1

0

1

The binary erasure channel

The binary symmetric channel

Page 4: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Content

Encoding

Received

Content

Encoding

Decoding

Transmission

Erasure Codes

n

cn

≥n

n

Page 5: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Performance Measures Time Overhead

The time to encode and decode expressed as a multiple of the encoding length.

Reception Efficiency Ratio of packets in message to packets needed to decode.

Optimal is 1.

Page 6: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Known Codes Random Linear Codes (Elias)

A linear code of minimum distance d is capable of correcting any pattern of d-1 or less erasures.

Achieves capacity of the channel with high probability, i.e. can be used to transmit over erasure channel at any rate R<1-p.

Decoding time O(n3). Unacceptable.

Reed-Solomon Codes Optimal reception efficiency with probability 1. Decoding and Encoding in Quadratic time. (About one minute to

encode 1MB).

Page 7: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Tornado Codes

Practical Loss-Resilient CodesMichael Luby, Amin Shokrollahi, Dan Spielman, Bolker

Stemann (1997)

Analysis of Random Processes Using And-Or Tree EvolutionMichael Luby, Amin Shokrollahi (1998)

Page 8: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Low Density Parity Check Codes Introduced in the early 60’s by Gallager and were reinvented many

times.

Message bits

Check bits

a b c d e f g h i j k l

eba

The time to encode is proportional to the number of edges.

Page 9: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Encoding Process.

Bipartite

Graph

Bipartite

Graph

Standard Loss-Resilient Code.

Length of message: k

Check bits:

Rate: 1-

Page 10: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Decoding Rule Given the value of a check bit and all but one of the

message bits on which it depends, set the missing message bit to be the XOR of the check bit and its known message bits.

XOR the message bit with all its neighbors.

Delete from the graph the message bit and all edges to which it belongs.

Decoding ends (successfully) when all edges are deleted.

Page 11: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Decoding Process

a

?

c

d

?

f

?

?

b

gb

hge

hgeb

Page 12: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Decoding Process

?

?

?

?

b

gb

hge

hgeb

g

hge

Page 13: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Regular Graphs

Random Permutation of the EdgesDegree 3 Degree 6

Page 14: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

3-6 Regular Graph Analysisleft leftright

x = Pr[ not recovered ]

Pr[ all recovered]

= (1-x)5

Pr[ not recovered]

= ¢ (1-(1-x)5)2

Page 15: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Decoding to Completion (sketch) Most message bits are roots of trees.

Concentration results (edge exposure martingale) proves that all but a small fraction of message bits are decoded with high probability.

The remaining bits are decoded do to expansion. (Original graph is a good expander on small sets).

If a set of size s and average degree a has more than as/2 neighbors then a unique neighbor exists and decoding continues.

Page 16: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

EfficiencyEncoding time (sec), 1k packets

size Reed-Solomon Tornado

250k 4.6 0.06

500k 19 0.12

1 MB 93 0.26

2 MB 442 0.53

4 MB 1717 1.06

9 MB 6994 2.13

16 MB 30802 4.33

Decoding time (sec), 1k packets

size Reed-Solomon Tornado

250k 2.06 0.06

500k 8.4 0.09

1 MB 40.5 0.14

2 MB 199 0.19

4 MB 800 0.40

9 MB 3166 0.87

16 MB 13829 1.75

Rate = 0.5

Erasure probability = 0.5

Implementation = ?

Page 17: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

LT Codes

LT CodesMichael Luby (2002)

Page 18: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

‘Rateless’ Codes A different model of transmition. Sender sends an infinite sequence of encoding

symbols. Time complexity: Average time for encoding a symbol.

Erasures are independent of content. Receiver may decode when received enough

symbols. Reception efficiency.

‘Digital Fountain’ approach.

Page 19: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Applications Unreliable Channels.

In Tornado codes small rate implies big graphs and therefore a lot of memory (proportional to the size of the encoding).

Multi-source download. Downloading from different servers requires no coordination. Efficient exchange of data between users requires small rate of

the source.

Multi-cast without feedback (say over the internet). Rateless codes are the natural notion.

Page 20: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Trivial Examples - Repetition Each time unit send a random symbol of the

code. Advantage: Encoding complexity O(1). Disadvantage: Need k’ = k ln(k/) code symbols to

cover all k content symbols with failure probability at most .

Example:

k = 100,000, =10-6

Reception overhead = 2400% (terrible)

Page 21: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Trivial Examples – Reed Solomon Each time unit send an evaluation of the

polynomial on a random point.

Advantage: Decoding possible when k symbols received.

Disadvantage: Large time complexity for encoding and decoding.

Page 22: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Parameters of LT Codes Encoding time complexity O(ln n) per symbol.

Decoding time complexity O(n ln n).

Reception efficiency: Asymptotically zero (unlike Tornado codes).

Failure probability: very small (smaller than Tornado).

Page 23: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

2

Content

Insert header, and sendDegree Prob

1 0.055

0.0004

0.32

0.13

0.084

100000

Degree Dist.

XOR content symbols

Choose degree

Choose 2 random content symbols

LT encoding

Page 24: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

1

Content

Insert header, and sendDegree Prob

1 0.055

0.0004

0.32

0.13

0.084

100000

Degree Dist.

Copy contentsymbol

Choose degree

Choose 1 randomcontent symbol

LT encoding

Page 25: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

4

Content

Insert header, and sendDegree Prob

1 0.055

0.0004

0.32

0.13

0.084

100000

Degree Dist.

XOR contentsymbols

Choose degree

Choose 4 randomcontent symbols

LT encoding

Page 26: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

LT encoding properties Encoding symbols generated independently of each other

Any number of encoding symbols can be generated on the fly

Reception overhead independent of loss patterns The success of the decoding process depends only on the degree

distribution of received encoding symbols. The degree distribution on received encoding symbols is the same

as the degree distribution on generated encoding symbols.

Page 27: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

1. Collect enough encoding symbols and set up graph between encoding symbols and content symbols to be recovered

3. Copy value of encoding symbol into unique neighbor, XOR value of newly recovered content symbol into encoding symbol neighbors and delete edges emanating from content symbol.

3. Copy value of encoding symbol into unique neighbor, XOR value of newly recovered content symbol into encoding symbol neighbors and delete edges emanating from content symbol.

2. Identify encoding symbol of degree 1. STOP if none exists.2. Identify encoding symbol of degree 1. STOP if none exists.

1. Collect enough encoding symbols and set up graph between encoding symbols and content symbols to be recovered

4. Go to Step 2.4. Go to Step 2.

Content (unknown)

LT decoding

Page 28: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Releasing an encoding symbol

xx-1x-1 recovered

content symbols

i-2

k-x unrecoveredcontent symbols

xth recoveredcontent symbol

releases encoding symbol

encoding symbol of degree i

content symbol can be recovered

by encoding symbol

Page 29: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

The Ripple Definition: At each decoding step, the ripple is the set of

encoding symbols that have been released at any previous decoding step but their one remaining content symbol has not yet been recovered.

xx recovered

content symbolsk-x unrecoveredcontent symbols

encoding symbolsin the ripple

collision

Page 30: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Successful Decoding Decoding succeeds iff the ripple never becomes empty

Ripple small Small chance of encoding symbol collisions small reception overheadRisk of ripple becoming empty due to random fluctuations is large

Ripple largeLarge chance of encoding symbol collisions large reception overheadRisk of ripple becoming empty due to random fluctuations is small

Page 31: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

LT codes ideaControl the release of encoding symbols over the

entire decoding process so that ripple is never empty but never too large

Very few encoding symbol collisionsVery little reception overhead

Page 32: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Release probability Definition: Release probability for degree i encoding

symbols at decoding step x is q(i,x).

Proposition:For i = 1: q(i,x) = 1 for x = 0, q(i,x) = 0 for all x > 1For i > 1: for x = i -1, …, k-1,

2

1

1

( 1) ( )( , )

1

i

j

i

j

i i k x x jq i x

k j

Page 33: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Release probability

xx-1x-1 recovered

content symbols

i-2

k-x unrecoveredcontent symbols

xth recoveredcontent symbol

releases encoding symbol

encoding symbol is released at decoding step x

content symbol can be recovered

by encoding symbol

Page 34: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Release distributions for specific degrees

i = 2

i = 3

i = 4

i = 10

i = 20

k = 1000

Page 35: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Overall release probability

Definition: At each decoding step x, r(x) is the overall probability that an encoding symbol is released at decoding step x with respect to specific degree distribution p(·)

Proposition: ( ) ( ) ( , )i

r x p i q i x

Page 36: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Uniform release question

Question: Is there a degree distribution such that the overall release distribution is uniform over x?

Why interesting?One encoding symbol released for each content symbol decodedRipple will tend to stay small minimize reception overheadRipple will tend not to become empty decoding will succeed

Page 37: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Uniform release answer: YES! Ideal Soliton Distribution:

1(1)

1For all 1, ( ) ( 1)

p k

i p i i i

Page 38: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Ideal Soliton Distribution

k = 1000

Page 39: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

A simple way to choose from Ideal SD

Choose A uniformly from the interval [0,1)

If then degree

Else degree = 1.

0

1/k

1/6

1/5

1/4 1/3 1/2 1

234561/k

Value of A

Degree

1A

1A k

Page 40: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Ideal SD Theorem: The overall release distribution is exactly uniform, i.e., r(x) = 1/k for all x = 0,…,k-1.

Ideal SD theorem

Page 41: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Overall release distribution for Ideal SD

Release

Distribution

k = 1000

Page 42: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

In expected value …

Optimal recovery with respect to Ideal SDReceive exactly k encoding symbolsExactly one encoding symbol released before any decoding steps, recovers one content symbolAt each decoding step a content symbol is recovered, it releases exactly one new encoding symbol, which in turn recovers exactly one more content symbolRipple size always exactly 1

Performance AnalysisNo reception overheadAverage degree

21 1( ) H( ) ln( )1

k

i ii p i k kk i

Page 43: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

When taking into account random fluctuations …

Ideal Soliton Distribution fails miserablyExpected behavior not equal to actual behavior because of varianceRipple very likely to become emptyFails with very very high probability (even with high reception overhead)

Page 44: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Robust Soliton Distribution design Need to ensure that the ripple never empties

At the beginning of the decoding process ISD: ripple is not large enough to withstand random fluctuations RSD: boost p(1)=c/ sqrt{k} so that expected ripple size at beginning is c *sqrt{k}

At the end of the decoding processISD: expected rate of adding to the ripple not large enough to compensate for collisions towards the end of the decoding process when ripple is large relative to the number of unrecovered content symbolsRSD: boost p(i) for higher degrees i so that expected ripple growth at the end of the decoding process is higher

Page 45: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

LT Codes – Bottom line Using the Robust Soliton Distribution:

Number of symbols needed to recover the data with probability is:

The average degree of an encoding symbol is:

Page 46: Erasure Correcting Codes In The Real World Udi Wieder Incorporates presentations made by Michael Luby and Michael Mitzenmacher

Online Codes

Online Codes

Petar Maymounkov

We are out of time