information and communication theory lecture 6

42
Information and Communication Theory Lecture 6 Channel Coding ario A. T. Figueiredo DEEC, Instituto Superior T´ ecnico, University of Lisbon, Portugal 2021 Lecture 6 (Channels) Information and Communication Theory 2021 1/

Upload: others

Post on 18-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Information and Communication Theory Lecture 6

Information and Communication Theory

Lecture 6

Channel Coding

Mario A. T. Figueiredo

DEEC, Instituto Superior Tecnico, University of Lisbon, Portugal

2021

Lecture 6 (Channels) Information and Communication Theory 2021 1/∞

Page 2: Information and Communication Theory Lecture 6

Channel Coding

Message W ∈ {1, ...,M}

Channel input alphabet X ; output alphabet Y.

Encoder: f : {1, ...,M} → X n.

Decoder: g : Yn → {1, ...,M}.

Message estimate: W = g(Y n)

Memoryless channel model: (X , p(y|x),Y)

p(y1, ..., yn︸ ︷︷ ︸yn

|x1, ..., xn︸ ︷︷ ︸xn

) =

n∏i=1

p(yi|xi)

An (M,n) code: ({1, ...,M}, f, g)

Lecture 6 (Channels) Information and Communication Theory 2021 2/∞

Page 3: Information and Communication Theory Lecture 6

Channels

Memoryless channel model: (X , p(y|x),Y)

Channel matrix: |X | × |Y|, with P = [Pi,j ] = p(Y = j|X = i).

Channel capacity:C = max

p(x)I(X;Y )

the maximum mutual information over all input distributions.

Example: noiseless binary channel (Y = X):

I(X;Y ) = H(Y ) = H(X).

C = maxp(x)

I(X;Y ) = maxp(x)

H(X) = 1 bit/symbol

A noiseless binary channel, can transmit 1 bit/symbol.Lecture 6 (Channels) Information and Communication Theory 2021 3/∞

Page 4: Information and Communication Theory Lecture 6

Binary Symmetric Channel

Binary symmetric channel: (X , p(y|x),Y)

H(Y |X = x) = H(α, 1− α), for x = 0 or x = 1.

H(Y |X) = H(α, 1− α)

I(X;Y ) = H(Y )−H(α, 1− α)

Capacity: let P(X = 0) = β

C = maxβ

H(Y )−H(α, 1− α)

= 1−H(α, 1− α) bits/symbol

...achieved for β = 1/2.

For α = 0 or α = 1, C = 1; for α = 1/2, C = 0.

Lecture 6 (Channels) Information and Communication Theory 2021 4/∞

Page 5: Information and Communication Theory Lecture 6

Binary Erasure Channel

Binary erasure channel: (X , p(y|x),Y)

H(X|Y = 1) = 0, for y = 0 or y = 1. H(X|Y = ∗) = 1

H(Y |X) = δ

I(X;Y ) = H(X)−H(X|Y ) = H(X)− δ

Capacity: let P(X = 0) = β

C = maxβ

H(X)− δ

= 1− δ bits/symbol

...achieved for β = 1/2.

For α = 0, C = 1; for α = 1, C = 0.

Lecture 6 (Channels) Information and Communication Theory 2021 5/∞

Page 6: Information and Communication Theory Lecture 6

Noisy Typewriter

Noisy typewriter: (X , p(y|x),Y)

H(Y |X = x) = 1, for x = 1, 2, 3, 4. H(Y |X) = 1

I(X;Y ) = H(Y )−H(Y |X) = H(Y )− 1

Capacity:

C = maxp(x)

H(Y )− 1

= 1 bit/symbol

...achieved for p(x) uniform.

Lecture 6 (Channels) Information and Communication Theory 2021 6/∞

Page 7: Information and Communication Theory Lecture 6

Properties of Channel Capacity

Because I(X;Y ) ≥ 0, then C ≥ 0

Because I(X;Y ) = H(X)−H(X|Y ) ≤ H(X),

C = maxp(x)

I(X,Y ) ≤ maxp(x)

H(X) ≤ log |X |

Because I(X;Y ) = H(Y )−H(Y |X) ≤ H(Y ),

C = maxp(x)

I(X,Y ) ≤ maxp(x)

H(Y ) ≤ log |Y|

Corollary: C ≤ min{log |X |, log |Y|}

Lecture 6 (Channels) Information and Communication Theory 2021 7/∞

Page 8: Information and Communication Theory Lecture 6

Exercises

Compute the capacity of a series connection of two binary symmetricchannels. Hint: consider the equivalent BSC.

Consider the parallel of two independent channels (X1, p1(y|x),Y1)and (X2, p2(y|x),Y2), i.e., the channel

(X1 ×X2, p(y1|x1)p(y2|x2),Y1 × Y2).

what is the capacity of this channel?

Consider N channels with |X | = |Y| and non-maximal capacity, i.e.,C < log |X |, connected in series. Show that the capacity of theresulting channel converges to zero as N goes to infinity. Hint: usethe data processing inequality.

Lecture 6 (Channels) Information and Communication Theory 2021 8/∞

Page 9: Information and Communication Theory Lecture 6

Exercises

Compute the capacity and the maximizing p(x) for the Z channel.

Consider a channel obtained by taking two conditional independentlooks at the output of a channel of capacity C, for each input: Y1and Y2. Show that the resulting capacity C ′ ≤ 2C. Hint: begin byshowing that I(X;Y1, Y2) = 2I(X;Y1)− I(Y1;Y2).

A symmetric channel is one in which every row of the channel matrixis a permutation of every other row and every column is a permutationof every other column. Show that in this case, the capacity is

C = log |Y| −H(any row of the channel matrix).

Show that the same result applies to weakly symmetric channels,where the columns are only required to sum to he same number.

Lecture 6 (Channels) Information and Communication Theory 2021 9/∞

Page 10: Information and Communication Theory Lecture 6

Channel Coding

Conditional probability of error (for i ∈ {1, ...,M})

λi = P(g(Y n) 6= i|Xn = f(i)

)=∑yn∈Yn

P(Y n = yn|Xn = f(i)) 1g(yn)6=i

where 1A = 1, if A is true, and 1A = 0, if A is false.

Maximum probability of error: λ(n) = maxi∈{1,...,M}

λi

Average probability of error: P (n)e = 1

M

M∑i=1

λi

Probability of error: P(g(Y n) 6= i)

If the message symbols are equiprobable: P(g(Y n) 6= i) = P (n)e

Of course, P(n)e ≤ λ(n) and P(g(Y n) 6= i) ≤ λ(n)

Lecture 6 (Channels) Information and Communication Theory 2021 10/∞

Page 11: Information and Communication Theory Lecture 6

Channel Coding

The rate of an (M,n) code

R =log2M

n

A rate R is achievable if there is a sequence of (d2nRe, n) codes, s.t.

limn→∞

λ(n) = 0

(Operational) capacity of a channel is

Coper = sup{R : R is achievable}

The channel coding theorem essentially states that

Coper = C = maxp(x)

I(X;Y )

Lecture 6 (Channels) Information and Communication Theory 2021 11/∞

Page 12: Information and Communication Theory Lecture 6

Channel Coding: Examples

Consider a quarternary source M = 4, thus log2M = 2 bits

Using a binary noiseless channel, C = 1 bits/transmission, we needn = 2 transmissions to send each symbol:

R =log2M

n=

2

2= 1 bits/transmission

is this rate achievable? Yes, because the channel is noiseless.

What if C < 1? Rate 1 is no longer achievable!

Using a noiseless quarternary channel (|X | = |Y| = 4): only needn = 1 transmissions,

R =log2M

1= 2 bits/transmission

is this rate achievable? Yes, because the channel is noiseless: C = 2

Lecture 6 (Channels) Information and Communication Theory 2021 12/∞

Page 13: Information and Communication Theory Lecture 6

Channel Coding: Example

Consider a binary symmetric channel with α = 1/5, thus C = 0.21

Thus, R = 0.25 is not achievable; R = 0.2 is achievable.

Examples of sequences (d2nRe, n), for R = 0.2:

(21, 5), (22, 10), (23, 15), ... e.g., use 10-bit codewords to send 2 bits

Examples of sequences (d2nRe, n), for R = 0.25:

(21, 4), (22, 8), (23, 12), ... e.g., use 8-bit codewords to send 2 bits

The channel coding theorem states that

X There is a sequence of (d2n 0.2e, n) codes, s.t. limn→∞ λ(n) = 0.

X For any sequence of (d2n 0.25e, n) codes, limn→∞ λ(n) 6= 0.

Lecture 6 (Channels) Information and Communication Theory 2021 13/∞

Page 14: Information and Communication Theory Lecture 6

Channel Coding: Example

The noisy typewriter is a simple example of the theorem.

Capacity: C = 1 bit/transmission.

Input and output alphabets X = Y = {1, 2, 3, 4}.

In this case, R = C = 1 is achievable.

Codes (d2ne, n) have λ(n) = 0, for any n.

Encoder (for n = 1, thus M = 2n = 2): f(1) = 1; f(2) = 3.

Decoder: g(1) = g(2) = 1; g(3) = g(4) = 2.

Since λ(n) = 0, C = 1 is the zero-error capacity.

Lecture 6 (Channels) Information and Communication Theory 2021 14/∞

Page 15: Information and Communication Theory Lecture 6

Asymptotic Equipartition: Motivation 1

Toss a fair coin 100 times: X1, ..., X100 ∈ {0, 1}.

...there are 2100 ' 1.27× 1030 possible outcomes,

...each with probability 2−100 ' 7.9× 10−30

The overwhelming majority has close to 50/50 heads/tails

How overwhelming? Let S = X1 + · · ·+X2,

P(S ∈ {47, ..., 53}) = 2−10053∑j=47

(100j

)' 0.52

How many sequences are in this set?

|{

(x1, ..., x100) : S ∈ {47, ..., 53}}| =

53∑j=47

(100j

)' 6.54× 1029

...fraction of the total: ' 6.54× 1029/2100 ' 0.52

Lecture 6 (Channels) Information and Communication Theory 2021 15/∞

Page 16: Information and Communication Theory Lecture 6

Asymptotic Equipartition: Motivation 2

Unfair coin (P(heads) = P(Xi = 1) = 0.9) 100 tosses: X1, ..., X100.

...there are 2100 ' 1030 possible outcomes,

The overwhelming majority has close to 90/10 heads/tails

How overwhelming? Let S = X1 + · · ·+X2,

P(S ∈ {87, ..., 93}) =

93∑j=87

0.9 j 0.1100−j(

100j

)' 0.76

How many sequences are in this set?

|{

(x1, ..., x100) : S ∈ {87, ..., 93}}| =

93∑j=87

(100j

)' 8.3× 1015

...fraction of the total: ' 8.3× 1015/2100 ' 6.5× 10−15

Lecture 6 (Channels) Information and Communication Theory 2021 16/∞

Page 17: Information and Communication Theory Lecture 6

Asymptotic Equipartition: Law of Large Numbers

Consider X1,...,Xn i.i.d. with E[Xi] = µ.

Weak law of large numbers (WLLN) (Bernoulli, 1713)

limn→∞

1

n

(X1 + · · ·+Xn

)= µ, (in probability)

Applying to log p(X1, ..., Xn),

− 1

nlog p(X1, ..., Xn) = − 1

n

n∑i=1

log p(Xi) −→n→∞

E[− log p(X)] = H(X)

This convergence is in probability: for any ε > 0,

limn→∞

P[∣∣∣∣− log p(X1, ..., Xn)

n−H(X)

∣∣∣∣ < ε

]= 1

Lecture 6 (Channels) Information and Communication Theory 2021 17/∞

Page 18: Information and Communication Theory Lecture 6

Asymptotic Equipartition Property (AEP)

Definition: for n i.i.d. samples x1, ..., xn of X ∈ X , the set of

ε-typical sequences A(n)ε (called typical set) is

A(n)ε = {(x1, ..., xn) :

∣∣− 1

nlog p(x1, ..., xn)−H(X)

∣∣ ≤ ε}The condition can also be written as

2−n(H(X)+ε) ≤ p(x1, ..., xn) ≤ 2−n(H(X)−ε)

AEP theorem: for any ε > 0 and n sufficiently large,

P[(x1, ..., xn) ∈ A(n)

ε

]≥ 1− ε

(1− ε)2n(H(X)−ε) ≤ |A(n)ε | ≤ 2n(H(X)+ε)

Lecture 6 (Channels) Information and Communication Theory 2021 18/∞

Page 19: Information and Communication Theory Lecture 6

AEP Corollary

AEP theorem: for any ε > 0 and n sufficiently large,

P[(x1, ..., xn) ∈ A(n)

ε

]≥ 1− ε

(1− ε)2n(H(X)−ε) ≤ |A(n)ε | ≤ 2n(H(X)+ε)

Corollary: if H(X) < log |X |, for a sufficiently small ε,

limn→∞

|A(n)ε ||X n|

≤ limn→∞

2n(H(X)+ε−log |X |) = 0,

if ε < log |X | −H(X).

Typical set: vanishingly small volume with arbitrarily high probability.

Lecture 6 (Channels) Information and Communication Theory 2021 19/∞

Page 20: Information and Communication Theory Lecture 6

AEP: Example 1

Toss a fair coin n times: X1, ..., Xn ∈ {0, 1}; H(X) = 1

...there are 2n possible outcomes,

...each with probability p(x1, ..., xn) = 2−n

With ε = 0.02,

A(n)0.02 = {(x1, ..., xn) : 2−n(1.02) ≤ p(x1, ..., xn) ≤ 2−n(0.98)} = {0, 1}n

0.98 2n(0.98) ≤ |A(n)0.02| ≤ 2n(1.02) (|A(n)

0.02| = 2n)

1 = P[(x1, ..., xn) ∈ A(n)

0.02

]≥ 0.98

For maximum entropy (H(X) = 1), the AEP is uninformative.

Lecture 6 (Channels) Information and Communication Theory 2021 20/∞

Page 21: Information and Communication Theory Lecture 6

AEP: Example 2

Toss a unfair coin (probability of heads 0.8) n times: X1, ..., Xn.

Entropy: H(X) ' 0.72

With ε = 0.02,

A(n)0.02 = {(x1, ..., xn) : 2−n(0.74) ≤ p(x1, ..., xn) ≤ 2−n(0.70)}

|A(n)0.02| ≤ 2n 0.74

P[(x1, ..., xn) ∈ A(n)

0.02

]≥ 0.98, for n large enough

For non-maximum entropy, the AEP is very informative:

|A(n)0.02|

|{0, 1}n|≤ 2−n 0.26 (e.g., for n = 100, 2−26 ' 10−8 )

Lecture 6 (Channels) Information and Communication Theory 2021 21/∞

Page 22: Information and Communication Theory Lecture 6

Interlude: AEP and Source Coding

Source X ∈ X ; order n extension of : Xn = (X1, ..., Xn) ∈ X n.

Coding method: given a sequence (x1, ...., xn),

X if (x1, ...., xn) ∈ A(n)ε code it using dn(H(X) + ε)e bits.

...enough, because #A(n)ε ≤ 2n(H(X)+ε)

X if (x1, ...., xn) 6∈ A(n)ε , code it using dlog |Xn|e = dn log |X |e bits.

...enough, because |A(n)ε | ≤ |Xn| = |X |n

X to distinguish the two cases, use a 1-bit prefix.

Length of this coding scheme:

lC(x1, .., xn) =

{1 + dn(H(X) + ε)e if (x1, ...., xn) ∈ A(n)

ε

1 + dn log |X |e if (x1, ...., xn) 6∈ A(n)ε

Lecture 6 (Channels) Information and Communication Theory 2021 22/∞

Page 23: Information and Communication Theory Lecture 6

Interlude: AEP and Source Coding

Length of the coding scheme in the previous slide:

lC(x1, .., xn) <

{2 + n(H(X) + ε) if (x1, ...., xn) ∈ A(n)

ε

2 + n log |X | if (x1, ...., xn) 6∈ A(n)ε

Expected length L[C] = E[lC(X1, ..., Xn)] (in bits/(n symbols)), for0 < ε� 1 and sufficiently large n,

L[C] < P[A(n)ε ]︸ ︷︷ ︸≤1

(2 + n(H(X) + ε)) +(1− P[A(n)

ε ])︸ ︷︷ ︸

≤ε

(2 + n log |X |)

≤ 2 + (n(H(X) + ε)) + ε(n log |X |)= 2 + n

(H(X) + ε+ ε log |X |

)Normalize to bits/symbol, dividing by n:

L[C]

n≤

2 + n(H(X) + ε+ ε log |X |

)n

,

...that is, L[C]/n can be arbitrarily close to H(X).

Lecture 6 (Channels) Information and Communication Theory 2021 23/∞

Page 24: Information and Communication Theory Lecture 6

Channel Coding TheoremConsider a discrete memoryless channel with capacity C. Then,

1) Any R < C is achievable: there exist sequences of (d2nRe, n) codessuch that lim

n→∞λ(n) = 0.

2) Any sequence of (d2nRe, n) codes with limn→∞

λ(n) = 0, must have

R ≤ C

Intuition: for large n, every channel looks like a noisy typewriter.

Lecture 6 (Channels) Information and Communication Theory 2021 24/∞

Page 25: Information and Communication Theory Lecture 6

Channel Coding Theorem: Overview of the Proof

Take p∗(x) = arg maxp(x)

I(X;Y ), and p(xn) =∏ni=1 p

∗(xi)

For every xn, consider H(Y n|Xn = xn); conditional typical set

A(n)ε (xn) = {yn :

∣∣− 1

nlog p(yn|xn)−H(Y |X)

∣∣ ≤ ε}For arbitrarily small ε and large n, AEP states that

|A(n)ε (xn)| ' 2nH(Y |X)

The unconditional typical set is

A(n)ε = {yn :

∣∣− 1

nlog p(yn)−H(Y )

∣∣ ≤ ε}For arbitrarily small ε and large n, AEP states that

|A(n)ε | ' 2nH(Y )

Lecture 6 (Channels) Information and Communication Theory 2021 25/∞

Page 26: Information and Communication Theory Lecture 6

Channel Coding Theorem: Overview of the Proof

To have (asymptotically as n→∞) error-free communication:

X Different A(n)ε (xn) must be disjoint:

X All A(n)ε (xn) must be inside A

(n)ε

The maximum number of words that we can have is thus,

M = 2nR ≤ |A(n)ε |

|A(n)ε (xn)|

' 2n(H(Y )−H(Y |X)) = 2nI(X;Y ) ≤ 2nC

...which leads to R < C.

This was not a rigorous proof; if you’re interested in the details, seethe recommended reading.

Lecture 6 (Channels) Information and Communication Theory 2021 26/∞

Page 27: Information and Communication Theory Lecture 6

Repetition Codes

Unlike for source coding (where we have Huffman codes), buildingcapacity-approaching codes is harder.

Simplest code: repetition; e.g., (d2n/3e, n) codes, rate R = 1/3.

For n = 3, we have (2, 3)-codes, thus M = 2 words, W ∈ {0, 1},

encoder f(0) = 000, f(1) = 111

decoder g(y3) = arg mini∈{0,1}

dH(y3, f(i)),

where dH is the Hamming distance (number of bits in which thewords differ): minimum distance decoding.

For higher n, we have (22, 6) codes (M = 4), ..., (25, 15) codes(M = 32), ...

Lecture 6 (Channels) Information and Communication Theory 2021 27/∞

Page 28: Information and Communication Theory Lecture 6

Error Correction and Error Detection

A binary encoder f : {1, ...,M} → {0, 1}n defines a set of codewords:

{f(1), f(2), ..., f(M)}.

Minimum distance decoding of received word yn:

g(yn) = arg mini∈{1,...,M}

dH(yn, f(i))

Minimum distance of the code:

dmin = mini 6=j

dH(f(i), f(j))

Error correction: a code corrects up todmin − 1

2errors.

Error detection: a code detects up to dmin − 1 errors.

Exercise: show that a repetition code corrects up to 1−R2R errors and

detects up to (1−R)/R errors.

Lecture 6 (Channels) Information and Communication Theory 2021 28/∞

Page 29: Information and Communication Theory Lecture 6

Hamming Codes

Binary linear codes are built on binary linear algebra.

Before proceeding, we need need binary arithmetic: ({0, 1},+,×)

X addition: 0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1, 1 + 1 = 0.

X multiplication: 0× 0 = 0, 0× 1 = 0, 1× 0 = 0, 1× 1 = 1.

X both are clearly commutative a+ b = b+ a and a× b = b× a.

X also associative: a+ (b+ c) = b+ (a+ c) and (a× b)× c = a× (a× c).

X distributive property: a× (b+ c) = a× b+ a× c.

In binary arithmetic, a+ b = a− b.

Based on binary arithmetic, we may build binary linear algebra, withbinary vectors and matrices.

Can be extended to other Galois fields GF (q); e.g., GF (q) ternaryarithmetic, with modulo-3 addition and multiplication.

Lecture 6 (Channels) Information and Communication Theory 2021 29/∞

Page 30: Information and Communication Theory Lecture 6

Hamming Codes

Generalizes the idea of parity check for error detection/correction.

Hamming(n, k) code is (in the previous notation) a (2k, n) code.

Rate of a Hamming(n, k) code: R = k/n.

Classical example: Hamming(7, 4) generator matrix:

G =

1 0 0 0 1 1 00 1 0 0 1 0 10 0 1 0 0 1 10 0 0 1 1 1 1

=[I4|A

]

Generation of codeword x from message: example m = (1101):

x = mG = (1101)G = (1101100)

where the vector-matrix product is in binary arithmetic.

Lecture 6 (Channels) Information and Communication Theory 2021 30/∞

Page 31: Information and Communication Theory Lecture 6

Hamming CodesGeneration of codeword x from message: example m = (1101):

x = mG = (1101)G = (1101100)

Checking codewords:parity-check matrix H such that

HGT = 0 ⇒ HxT = H(mG)T = HGTmT = 0

For G =[I4|A

], then H =

[AT |I3

]HGT =

[AT |I3

] [ I4AT

]= AT + AT = 0

For matrix G in the previous slide

H =[AT |I3

]=

1 1 0 1 1 0 01 0 1 1 0 1 00 1 1 1 0 0 1

...the columns are the (23 − 1 = 7) 3-bit binary words, except (000).

Lecture 6 (Channels) Information and Communication Theory 2021 31/∞

Page 32: Information and Communication Theory Lecture 6

Hamming Codes

Let x + e be a received codeword, with error vector e.

Checking: H(x + e)T = HxT︸ ︷︷ ︸0

+HeT = HeT

No errors detected if and only if HeT = 0. Conditions:

X Zero errors, HeT = 0.

X One error, HeT 6= 0; it is one of the columns of H.

X Two errors, HeT 6= 0; it is the sum of two columns of H, which are alldifferent.

Any two errors are detected, but three errors may be undetected,since the sum of any two columns equals another column.

Exercise: show that for a Hamming(7, 4) code, dmin = 3, thus itcorrects 1 error and detects up to 2 errors.

Lecture 6 (Channels) Information and Communication Theory 2021 32/∞

Page 33: Information and Communication Theory Lecture 6

Hamming Codes

Minimum distance of Hamming(7, 4) code is 3.

Thus it can correct 1 error; how?

Permute the columns of H (and similarly of G) into

H =

0 0 0 1 1 1 10 1 1 0 0 1 11 0 1 0 1 0 1

Check x + e, assuming only one error in position, say 5,

(H(xT + eT ))T = eHT = (0000100)HT = (101)

...precisely the binary word for 5, the error position.

Lecture 6 (Channels) Information and Communication Theory 2021 33/∞

Page 34: Information and Communication Theory Lecture 6

Hamming CodesGeneral Hamming(n, k) codes.

For some r ≥ 2: n = 2r − 1 and k = 2r − r − 1.

The Hamming(7, 4) code: r = 3, n = 23− 1 = 7, k = 23− 3− 1 = 4.

Columns H: all n = 2r − 1 binary words of r bits, except zero.

Put H is systematic form H =[AT |I3

]and build G =

[I4|A

].

Rate: R = k/n = (2r − r − 1)/(2r − 1)

Exercise: show that, for any r, dmin = 3.

Remarkably, limr→∞

2r − r − 1

2r − 1= 1

The repetition code also has minimum distance 3, but R = 1/3.

Error-correcting codes are a huge R&D area, without which moderncommunications would not be possible.Lecture 6 (Channels) Information and Communication Theory 2021 34/∞

Page 35: Information and Communication Theory Lecture 6

Exercises

Show that the repetition code of R = 1/3 is a Hamming(n, k) code.Find r, n, k, and the matrices H and G.

Consider a Hamming(7, 4) code in systematic form. Decode the word(1011011).

A Hamming code is a particular case of the more general family oflinear codes, i.e., where the code words are generated as x = mG.Show that for any binary linear code,

a) the zero word is a valid codeword;b) dmin is the weight (number of 1s) is the minimum-weight code word.

Assuming a Hamming(7, 4) code is used on a BSC with probability oferror α, what is the probability of an erroneous decoding?

Lecture 6 (Channels) Information and Communication Theory 2021 35/∞

Page 36: Information and Communication Theory Lecture 6

Gaussian Channel

Gaussian channel: X = Y = R, Y = X + Z.

Mutual information

I(X;Y ) = h(Y )− h(Y |X)

= h(Y )− h(X + Z|X)

= h(Y )− h(Z)

= h(Y )− 1

2log(2πeN

)since differential entropy is shift-invariant, and Z is Gaussian andindependent of X.

Since adding a constant to X does not affect I(X;Y ), assumeE[X] = 0, thus var[X] = E[X2] = power.

Lecture 6 (Channels) Information and Communication Theory 2021 36/∞

Page 37: Information and Communication Theory Lecture 6

Gaussian Channel

Gaussian channel: X = Y = R, Y = X + Z.

Mutual information,

I(X;Y ) = h(Y )− 1

2log(2πeN

)≤ 1

2log(2πe(N + E[X2])

)− 1

2log(2πeN

)=

1

2log(

1 +E[X2]

N

)Without a constraint on E[X2], I(X;Y ) is arbitrarily large.

With a power constrain E[X2] ≤ P ,

C = maxfX :E[X2]≤P

1

2log(

1 +E[X2]

N

)=

1

2log(

1 +P

N

)achieved for fX = N (0, P ). P/N = SNR, signal to noise ratio.

Lecture 6 (Channels) Information and Communication Theory 2021 37/∞

Page 38: Information and Communication Theory Lecture 6

Coding for a Gaussian Channel

An (M,n) code for a Gaussian channel, under power constraint P .

X a set of message indices W ∈ {1, ...,M};X an encoder f : {1, ...,M} → Rn, i.e., f(i) = [f1(i), ..., fn(i))] ∈ Rn,

‖f(i)‖2 =

n∑j=1

fj(i)2 ≤ nP.

X a decoder g : Rn → {1, ...,M}.

Conditional, average, and maximum probability of error, λ(n), aredefined as in the discrete channel.

Rate R is achievable if there exists a sequence of (2nR, n) codessatisfying the power constraint P , such that limn→∞ λ

(n) = 0.

The (operational) capacity is: Coper = sup{R : R is achievable}.

The Gaussian channel theorem: Coper = C.

Lecture 6 (Channels) Information and Communication Theory 2021 38/∞

Page 39: Information and Communication Theory Lecture 6

Coding for a Gaussian Channel

Outline of the proof of the Gaussian channel theorem.

We known that Y n = Xn + Zn, that E[‖Xn‖2] ≤ nP , thusE[‖Zn‖2] ' n(P +N).

All the received vectors are, with high probability (w.h.p.), in a sphereof radius

√n(N + P ).

Each received vector is, w.h.p., in a sphere around f(i) of radius√nN .

The volume of a radius-r sphere is V (r) = Cnrn.

The maximum number of (asymptotically) non-intersecting spheres is

M = 2nR ≤ (n(N + P ))n/2

(nN)n/2= 2

n2log(

P+NN

)= 2

n2log(1+ P

N

)...thus R < C.

Lecture 6 (Channels) Information and Communication Theory 2021 39/∞

Page 40: Information and Communication Theory Lecture 6

Sphere Packing

Thus picture becomes accurate for large n, since ”in high dimensions,Gaussian distributions are soap bubbles.”1

1www.inference.vc/high-dimensional-gaussian-distributions-are-soap-bubble/

Lecture 6 (Channels) Information and Communication Theory 2021 40/∞

Page 41: Information and Communication Theory Lecture 6

Exercises

Consider the multi-path channel where the noises Z1 an Z2 follow aGaussian joint probability density function with zero mean and covariance K

where σ2 is the noise variance and ρ the correlation coefficient. Find thecapacity of the channel. What is the capacity for ρ = 1, ρ = 0, ρ = −1;interpret the results.

Continuous channel with discrete input: consider a channel with inputX ∈ {0, 1} and ouput Y = X + Z, where Z ∈ [0, a] with uniform density.Assuming a > 1, find the capacity of the channel. Repeat for a < 1 andinterpret the result.

Lecture 6 (Channels) Information and Communication Theory 2021 41/∞

Page 42: Information and Communication Theory Lecture 6

Recommended Reading

T. Cover and J. Thomas, “Elements of Information Theory”, JohnWiley & Sons, 2006 (Sections 7.1 to 7.6, 7.11, 9.1).

https://en.wikipedia.org/wiki/Hamming_code

Lecture 6 (Channels) Information and Communication Theory 2021 42/∞