rudiger-urbanke-lecture.pdf

7/24/2019 Rudiger-Urbanke-Lecture.pdf

1/70

Approaching ShannonRuediger Urbanke, EPFL

Summer School @ USC, August 6th, 2010

Many thanks to Dan Costello, Shrinivas Kudekar, Alon Orlitsky, and Thomas Riedelfor their help with these slides.


2/70

Storing Shannon


3/70

Networking Shannon


4/70

Completing Shannon


5/70

Compressing Shannon


6/70

Reading Shannon


7/70


8/70

Coding


9/70

Disclaimer

Technical slides do not contain references.These are all summarized at the end of each section.


10/70

Classes of Codes

(linear) block codes convolutional codes

sparse graph codes polar codes


11/70

How Do We Compare?

block error probabilityPN(R,C)

R rate

C capacity

complexity

N block length


12/70

How We Compare: Error Exponent

error exponentE(R,C) = limN

1

N log(PN(R,C))


13/70

How We Compare: Finite-Length Scaling

10-8

10

-7

10-6

10-5

10-4

10-3

10-2

10-1

10-8

10

-7

10-6

10-5

10-4

10-3

10-2

10-1

Nincreasing

channel quality

1/

f(z) scaling function;mothercurve

scaling exponent

threshold

N

1 ( )

PN

(R,

)

0


14/70

How We Compare: Finite-Length Scaling

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

C

C

CN1/(C C)

PN

(R,

C) Nincreasing

rate

threshold

1/


scaling exponent

0


15/70

Finite-Length Scaling


> 0 scaling exponent

limN:N1/(CR)=z

PN(R,C) =f(z)


16/70

Finite-Length Scaling -- References

V. Privman, Finite-size scaling theory, in Finite Size Scaling and Numerical Simulation of StatisticalSystems, V. Privman, ed., World Scientific Publ., Singapore, 1990, pp. 198.


17/70

Complexity

= CR gap to capacity

exponential versus polynomial

linear -- but look at prefactor


18/70

Block Codes


19/70

Error Exponent of Block Codes under MAP

error exponentE(R,C) = limN N

log(PN(R,C))

quadratic

Figure ``borrowed from


20/70

Error Exponent -- References

R. Gallager,Information Theory and Reliable Communication, Wiley 1968.

A. Barg and G. D. Forney, Jr., Random codes: Minimum distances and error exponents, IEEETransactions on Information Theory, Sept 2002.


21/70

Scaling of Block Codes under MAP -- BEC

``perfect code

= 0 = 1 = 1R

PN

0

1

erasure fraction

distribution of erasures

E[E] = N[(E E)2] = N(1 )

E N(N, N(1 ))

PN QN((1 R) )

N(1 )

= Q

N((1 R) )(1 )

= Q

z(1 )

z =N((1R) )


22/70

Scaling of Block Codes under MAP -- BEC

random linear block codes are almost perfect

square binary

random matrixof dimension n

0010101000101110010010

101010101010100010100101011100010

probability that full rank

n1

i=0

2n 2i

2n =

n1

i=0

(1 2in) n 0.28878809508. . .

00101010001011100100101010101010101000101001

if we have k rowsless then probabilitydecays by roughly

2(k+1

2 )

hence for random linear block codes the transition is of constant (on an absolute scale) width


23/70

Scaling of Block Codes under MAP

logA(N,P) = NCNVQ1(P) + O(log(N))

P error probability

block length

A(N,P) size of largest such code

i(x, y) = logdp(y| x)

dp(y)

C= E[i(x, y)]

V = V[i(x, y)]


24/70

Finite-Length Scaling -- References

G. Landsburg, Uber eine Anzahlbestimmung und eine damit zusammenhangende Reihe, J.

Reine Angew. Math. vol. 111, pp. 87-88, 1893.A. Feinstein, A new basic theorem of information theory, IRE Trans. Inform. Theory, vol.PGIT-4, pp. 222, 1954.V. Strassen, Asymptotische Abschtzungen in Shannons Informationstheorie, Trans. ThirdPrague Conf. Information Theory, pp. 689723, 1962.Y. Polyanskiy, H. V. Poor and S. Verd, "Dispersion of Gaussian Channels," 2009 IEEE Int.Symposium on Information Theory, Seoul, Korea, June 28-July 3, 2009.Y. Polyanskiy, H. V. Poor and S. Verd, "Dispersion of the Gilbert-Elliott Channel," 2009 IEEEInt. Symposium on Information Theory, Seoul, Korea, June 28-July 3, 2009.

For a very simple proof of previous result ask Thomas Riedel, UIUC


25/70

Convolutional Codes

convolutional codes


26/70

Convolutional Codes

Figures `borrowed from

affine


27/70

Finite-Length Scaling of LDPC Codes -- BEC

scaling behavior?

K constraint length


28/70

Convolutional Codes -- Some References

Big bang:

P. Elias, Coding for noisy channels, in IRE International Convention Record, Mar. 1955, pp. 3746.

Algorithms and error exponents:

J. M. Wozencraft, Sequential decoding for reliable communication, Research Lab. of Electron. Tech. Rept. 325,MIT, Cambridge, MA, USA, 1957.

R. M. Fano, A heuristic discussion of probabilistic decoding, IEEE Trans. Information Theory, vol. IT-9, pp.64-74, Apr. 1963.A. J. Viterbi, Error bounds of convolutional codes and an asymptotically optimum decoding algorithm, IEEETrans. Inform. Theory, 13 (1967), pp. 260269.H. L. Yudkin, Channel state testing in information decoding, Sc.D. thesis, Dept. of Elec. Engg.,M.I.T., 1964.J. K. Omura, On the Viterbi decoding algorithm, IEEE Trans. Inform. Theory,15 (1969), pp. 177179.G. D. Forney, Jr., The Viterbi algorithm, Proc. IEEE, 61 (1973), pp. 268278.K. S. Zigangirov, Time-invariant convolutional codes: Reliability function, in Proc. 2nd Joint Soviet-SwedishWorkshop Information Theory, Grnna, Sweden, Apr. 1985.

N. Shulman and M. Feder, Improved Error Exponent for Time-Invariant and Periodically Time-VariantConvolutional Codes, IEEE Trans. Inform. Theory, 46 (2000), pp. 97103.G. D. Forney, Jr., The Viterbi algorithm: A personal history. E-print: cond-mat/0104079, 2005.

Overview:

A. J. Viterbi and J. K. Omura, Principles of Digital Communication and Coding, McGraw-Hill, New York, NY,USA, 1979.S. Lin and D. J. Costello, Jr., Error Control Coding, Prent. Hall, Englewood Cliffs, NJ, USA, 2nd ed., 2004.

R. Johannesson and K. S. Zigangirov, Fundamentals of Convolutional Coding, IEEE Press, Piscataway, NJ, USA,1999.


29/70

Some Open Questions

Scaling behavior


30/70

digrams such as TH, ED, etc. In the second-order approximation, digram structure is introduced. After a

letter is chosen, the next one is chosen in accordance with the frequencies with which the various letters

follow the first one. This requires a table of digram frequenciesp i j. In the third-order approximation,

trigram structure is introduced. Each letter is chosen with probabilities which depend on the preceding two

letters.

3. THE S ERIES OFA PPROXIMATIONS TO ENGLISH

To give a visual idea of how this series of processes approaches a language, typical sequences in the approx-

imations to English have been constructed and are given below. In all cases we have assumed a 27-symbol

alphabet, the 26 letters and a space.

1. Zero-order approximation (symbols independent and equiprobable).

XFOML RXKHRJFFJUJ ZLPWCFWKCYJ FFJEYVKCQSGHYD QPAAMKBZAACIBZL-

HJQD.

2. First-order approximation (symbols independent but with frequencies of English text).

OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHTTPA OOBTTVA

NAH BRL.

3. Second-order approximation (digram structure as in English).

ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY ACHIN D ILONASIVE TU-

COOWE AT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE.

4. Third-order approximation (trigram structure as in English).

IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONS-

TURES OF THE REPTAGIN IS REGOACTIONA OF CRE.

5. First-order wordapproximation. Rather thancontinuewith tetragram, ,n-gramstructure it is easier

and better to jump at this point to word units. Here words are chosen independently but with their

appropriate frequencies.

REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NAT-

URAL HEREHE THEA IN CAME THETO OF TOEXPERT GRAY COMETO FURNISHES

THE LINE MESSAGE HAD BE THESE.

6. Second-order word approximation. The word transition probabilities are correct but no further struc-

ture is included.

THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHAR-

ACTER OF THISPOINTIS THEREFORE ANOTHER METHODFOR THELETTERS THAT

THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.

The resemblance to ordinaryEnglish text increases quite noticeably at each of the above steps. Note that

these samples have reasonably good structure out to about twice the range that is taken into account in their

construction. Thus in (3) the statistical process insures reasonable text for two-letter sequences, but four-

letter sequences from the sample can usually be fitted into good sentences. In (6) sequences of four or more

words can easily be placed in sentences without unusual or strained constructions. The particular sequence

of ten words attack on an English writer that the characterof this is not at all unreasonable. It appears then

that a sufficiently complex stochastic process will give a satisfactory representation of a discrete source.

The first two samples were constructed by the use of a book of random numbers in conjunction with

(for example 2) a table of letter frequencies. This method might have been continued for (3), (4) and (5),

since digram, trigram and word frequency tables are available, but a simpler equivalent method was used.

7

LDPC UDREVOIGRES


31/70

Sparse Graph Codes


32/70


33/70

var

iablenodes

check

no

des

permutat

ion

! "

H x = 0

sparse

x1+x4+x8 = 0

LDPC Ensemble


34/70

(1-(1-x)5)3

34

#

Asymptotic Analysis -- BEC


35/70

35

#

(1-(1-x)5)3

MAP versus BP


36/70

Capacity Achieving -- BEC


37/70

Capacity Approaching -- BMS


38/70

Error Exponent of LDPC Codes -- BP

graphG

channel parameter

# of iterations

P{|PN(G, , )E[PN(G, , )]| } eN

If converges to zero for large and if codehas error correcting radius then we can prove that the code

has an error exponent under iterative decoding.

E[PN(G, , )]

simplest sufficient condition:code has expansion at least 3/4 which istrue whp if left degree is at least 5; (less restrictive conditions are knownbut more complicated); codes used in ``practice do not have error exponents


39/70

Expansion

|V||C|at most sizedl |V|

take the smallest ratio|C|/(dl |V|)over all ``small sets

(dl, dr)-regular cannothave expansion beyond(dl-1)/dl

remarkably, random graphsessentially achieve this bound whp


40/70

Finite-Length Scaling of LDPC Codes -- BEC

scaling parameters computable

PN =Q(z/)(1 + O(N

1

3 )) z =N(BP N2/3 )

0.3 0.35 0.4 0.45 0.5

PN

10-1

10-2

10-3

10-4

10-5

(x) =1

6

x +5

6

x3

BP

= 0.

4828

= 0.5791

/ = 0.6887

(x) = x5 R = 3/7

(we ignore error floor here!)


41/70

Optimization

0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75

10-6

10-5

10-4

10-3

10-2

10-1

40.58 %

0.0 1.0rate/capacity

2 3 4 5 6 7 8 9 10

2 3 4 5 6 7 8 9 10 11 12 13

contribution to error floor

6 8 10 12 14 16 18 20 22 24 2 6

0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75

10-6

10-5

10-4

10-3

10-2

10-1

40.58 %

0.0 1.0rate/capacity

2 3 4 5 6 7 8 9 10

2 3 4 5 6 7 8 9 10 11 12 13

contribution to error floor

6 8 10 12 14 16 18 20 22 24 2 6


42/70

Finite-Length Scaling of LDPC Codes -- BAWGNC

(3, 6) BSC (3, 4) BAWGNC

same form of scaling law; parameters are computable but no proof


43/70

Gap To Threshold versus Length

limN:N1/(CR)=z

PN(R,C) =f(z) N1/(CR) = z fixes errorN1/ (CR)

= z

N= (z )additive gap

= 2halving the gap requires increasing length by 4


44/70

Gap versus Complexity (per bit)

BEC/Threshold -- O(1); degrees are constant and we touch every edge at mostonce

BEC/Capacity -- O(log(1/\delta)) for standard LDPC; degrees grow like log(1/\delta)and we touch every edge once

BMS/Threshold -- ???

BMS/Capacity -- ???

BEC/Capacity -- O(1) for MN-type LDPC ensembles; degrees are constant and wetouch every edge at most `once


45/70

Sparse Graph Codes -- Some References

Big bang:R. G. Gallager, Low-density parity-check codes, IRE Trans.Inform.Theory, 8 (1962).C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon limit error-correcting coding anddecoding, in Proc. of ICC, Geneva, Switzerland, May 1993.

Analysis:M. Luby, M. Mitzenmacher, A. Shokrollahi, and D. A. Spielman, Analysis of low density codes andimproved designs using irregular graphs, in Proc. of the 30th Annual ACM Symposium on Theory ofComputing, 1998, pp. 249258.M. Luby, M. Mitzenmacher, A. Shokrollahi, and D. A. Spielman, Efficient erasure correcting codes, IEEETrans. Inform. Theory, 47 (2001), pp. 569584.T. Richardson, A. Shokrollahi, and R. Urbanke, Design of capacity-approaching irregular low-densityparity-check codes, IEEE Trans. Inform. Theory, 47 (2001), pp. 619637.T. Richardson and R. Urbanke, The capacity of low-density parity check codes under message-passing decoding, IEEE Trans. Inform. Theory, 47 (2001), pp. 599618.S.-Y. Chung, G. D. Forney, Jr., T. Richardson, and R. Urbanke, On the design of low-density parity-

check codes within 0.0045 dB of the Shannon limit, IEEE Commun. Lett., 5 (2001), pp. 5860.

Error exponents:D. Burshtein and G. Miller, Expander graph arguments for message-passing algorithms, IEEE Trans.Inform. Theory, 47 (2001), pp. 782790.O. Barak and D. Burshtein, Upper Bounds on the Error Exponents of LDPC Code Ensembles, TheIEEE International Symposium on Information Theory (ISIT-2006), Seattle, July 2006.


46/70

Sparse Graph Codes -- Some References

Finite-length scaling:A. Montanari, Finite-size scaling of good codes, in Proc. of the Allerton Conf. on Commun., Control,and Computing, Monticello, IL, USA, Oct. 2001.A. Amraoui, A. Montanari, T. Richardson, and R. Urbanke, Finite-length scaling for iteratively decodedLDPC ensembles, in Proc. of the Allerton Conf. on Commun., Control, and Computing, Monticello, IL,USA, Oct. 2003.J. Ezri, A. Montanari, and R. Urbanke, Finite-length scaling for Gallager A, in 44th Allerton Conf. on

Communication, Control, and Computing, Monticello, IL, Oct. 2006.A. Dembo and A. Montanari, Finite size scaling for the core of large random hyper-graphs. E-print:math.PR/0702007, 2007.Ezri, J., Montanari, A., Oh, S. and Urbanke, R.(2008) The Slope Scaling Parameter for GeneralChannels, Decoders, and Ensembles. Proceeding of the IEEE International Symposium on InformationTheory.

Complexity:

A. Khandekar and R. J. McEliece, On the complexity of reliable communication on the erasurechannel, in Proc. IEEE Int. Symp. Information Theory (ISIT2001), Washington, DC, Jun. 2001, p. 1.H. D. Pfister, I. Sason, and R. Urbanke, Capacity-achieving ensembles for the binary erasure channelwith bounded complexity, IEEE Transactions on Inform. Theory, vol. 51 , issue 7, 2005 , pp. 2352 -2379.

Overviews:D. J. C. MacKay, Information Theory, Inference, and Learning Algorithms, Cambridge Univ. Press,2003.

T. Richardson and R. Urbanke, Modern Coding Theory, Cambridge Univ. Press, 2008.


47/70

Some Open Questions

Simple design procedures?

Can you achieve capacity on general BMS channels?

Thresholds under LP decoding?

Scaling for general BMS channels?

Scaling under MAP?

Scaling under LP decoding?

Scaling under flipping decoding?

Scaling to capacity?


48/70

Polar Codes


49/70

patterns


50/70

Codes from Kronecker Product of G2


51/70

Reed-Muller Codes

choose rows of largest weight


52/70

Polar Codes

W -- BMS channel


53/70

Channel Polarization

bad

ch

annels

0

0

0

0


54/70

Successive Decoding


55/70

Successive Decoding

Stefan Meier http://ipgdemos.epfl.ch/polarcodes/


56/70

Channel Polarization

threshold

Stefan Meier http://ipgdemos.epfl.ch/polarcodes/


57/70

How Do Channels Polarize?

X1

X2

U1=X1+X2; observe Y1and Y2 U2=X2; U2=X1+U1 ; observe Y1and Y2

U2

Y2 Y1

X2 X1+U1=X1

repetition code

+

U1

Y1 Y2

X1 X2

parity-check node

#

1-(1-#)2

much worse

#

#2 much better

BEC(#)

BEC(#)noise

known

total capacity = (1-#)2

+1-#2

=2(1-#)


58/70

How Do Channels Polarize?

0.5

0.5

0.75

0.25

0.9375

0.4375

0.5625

0.0625

0.9961

0.8086

0.6836

0.1211

0.8789

0.3164

0.1914

0.0039


59/70

Polar Codes -- Some References

Big bang:

E. Arikan, Channel polarization: A method for constructing capacity-achieving codes for symmetricbinary-input memoryless channels, http://arxiv.org/pdf/0807.3917

Exponent:

E. Arikan and E. Telatar, On the Rate of Channel Polarization, http://arxiv.org/pdf/0807.3806

S. B. Korada, E. Sasoglu, and R. Urbanke, Polar Codes: Characterization of Exponent, Bounds, andConstructions, http://arxiv.org/pdf/0901.0536

Source Coding:

N. Hussami, S. B. Korada, and R. Urbanke, Performance of Polar Codes for Channel and SourceCoding, http://arxiv.org/pdf/0901.2370S. B. Korada, and R. Urbanke, Polar Codes are Optimal for Lossy Source Coding, http://arxiv.org/pdf/0903.0307E. Arikan, Source Polarization, http://arxiv.org/pdf/1001.3087

Non-symmetric and non-binary channels:

E. Sasoglu, E. Telatar,and E.Arikan, Polarization for arbitrary discrete memoryless channels, http://arxiv.org/pdf/0908.0302R. Mori and T. Tanaka, Channel Polarization on q-ary Discrete Memoryless Channels by ArbitraryKernels, http://arxiv.org/pdf/1001.2662R. Mori and T. Tanaka, Non-Binary Polar Codes using Reed-Solomon Codes and Algebraic GeometryCodes, http://arxiv.org/pdf/1007.3661


60/70

MAC channel:

E. Sasoglu, E. Telatar and Edmund Yeh, Polar codes for the two-user multiple-access channel, http://arxiv.org/pdf/1006.4255E. Abbe and E. Telatar, Polar Codes for the m-User MAC and Matroids, http://arxiv.org/pdf/1002.0777

Compound channel:

S. H. Hassani, S. B. Korada,and R. Urbanke, The Compound Capacity of Polar Codes, http://arxiv.org/pdf/0907.3291

Wire-tap channel and security:

H. Mahdavifar and A. Vardy,Achieving the Secrecy Capacity of Wiretap Channels Using Polar Codes,http://arxiv.org/pdf/1007.3568E. Hof and S. Shamai, Secrecy-Achieving Polar-Coding for Binary-Input Memoryless Symmetric Wire-

Tap Channels, http://arxiv.org/pdf/1005.2759Mattias Andersson, Vishwambhar Rathi, Ragnar Thobaben, Joerg Kliewer, Mikael Skoglund, NestedPolar Codes for Wiretap and Relay Channels, http://arxiv.org/pdf/1006.3573O. O. Koyluoglu and H. El Gamal, Polar Coding for Secure Transmission and Key Agreement, http://arxiv.org/pdf/1003.1422

Constructions:

R. Moriand T. Tanaka, Performance and Construction of Polar Codes on Symmetric Binary-Input

Memoryless Channels, http://arxiv.org/pdf/0901.2207M. Bakshi, S. Jaggi, and M. Effros, Concatenated Polar Codes, http://arxiv.org/pdf/1001.2545

Scaling:

S. H. Hassani and R. Urbanke, On the scaling of Polar codes: I. The behavior of polarized channels,http://arxiv.org/pdf/1001.2766T. Tanaka and R. Mori, Refined rate of channel polarization, http://arxiv.org/pdf/1001.2067S. H. Hassani, K. Alishahi and R. Urbanke, On the scaling of Polar Codes: II. The behavior of un-polarized channels, http://arxiv.org/pdf/1002.3187


61/70

Error Exponent of Polar Codes -- BECA First Guess

Y =

log(Z)

X= log(Y)

assume that Z is already small,hence Y is large

Z

Z2, wp1

2,

2ZZ2, wp12.

Z

Z2, wp 1

2,

1 (1 Z)2, wp 12.

Y

2Y, wp 1

2,

Y 1, wp 12.

X

X+ 1, wp 1

2,

X, wp 12.


62/70

Error Exponent of Polar Codes -- BECA First Guess

random walk on lattice with drift

after m steps we expect X to have value roughly m/2

this means we expect Y to have value roughly 2m/2 = 2N

this means we expect Z to have value roughly 2 N

X

X+ 1, wp 1

2,

X, wp 12.


63/70

Error Exponent of Polar Codes

limm

P(Zm 22

m/2+mQ1(R/C)/2+o(

m)

) = R


64/70

Finite-Length Scaling for Polar Codes (BEC)

Z

Z2, wp1

2,

1 (1 Z)2, wp12.

Battacharyya process

z = 0 z = 1

2

symmetry of distribution = 1/2

QN(x) = 1

N|{i : x E(W

(i)N

) 1

4}|

z = 0 z = 1

1

2

z = 2x

(general case follows in similar manner)


65/70

Finite-Length Scaling for Polar Codes -- BEC

Q(x) = limN

N1

QN(x) scaling assumption

3.62

4

BEC

BAWGNC

Q(x) = 21

1Q(1/2

1/4 x/2) + (1 21{x1/8})Q(min

x/2, 1/2

x/2)

solve this functional equation (numerically)

this gives Q(x) (up to scaling) and

Q(x)

2Q2N(x)=QN(1/21/4 x/2)+(121{x1/8})QN(min

x/2, 1/2

x/2)


66/70

Finite-Length Scaling for Polar Codes -- BECSimulations versus Scaling

PN(R,C) Q1(N

1

(CR))

N1 (CR)

N = 223, 224, 225, 226

lo

g10

PN

(R,

C)

CR


67/70

0.43 bits/channel use86 % of capacity

Gap To Capacity versus Length

limN:N1/(CR)=z

PN(R,C) =f(z) N1 (CR) = z fixes errorN1/ (CR)

= z

N= (z/) 4

!10 billion to get 1% close to capacity

additive gap


68/70

Gap to Capacity versus Complexity

=O(log(1/))

complexity per bit


69/70

Some Open Questions

Variation on the theme that performs better at small lengths?

Do RM codes achieve capacity?

Make scaling conjecture under successive decoding rigorous.

Scaling behavior under MAP decoding?

Find a reasonable channel where they do notwork. :-)


70/70

Message

sparse graph codes -- best codes in ``practice; still misssome theory; error floor region is tricky; still somewhat of an

art to construct

polar codes -- nice for theory; not (yet) ready for applicationsbut the field is young; how do we improve finite-lengthperformance

scaling behavior is the next best thing to exact analysis;probably more meaningful characterization for practical

case than error exponent

rudiger-urbanke-lecture.pdf

Documents