information and communication theory lecture 6
TRANSCRIPT
Information and Communication Theory
Lecture 6
Channel Coding
Mario A. T. Figueiredo
DEEC, Instituto Superior Tecnico, University of Lisbon, Portugal
2021
Lecture 6 (Channels) Information and Communication Theory 2021 1/∞
Channel Coding
Message W ∈ {1, ...,M}
Channel input alphabet X ; output alphabet Y.
Encoder: f : {1, ...,M} → X n.
Decoder: g : Yn → {1, ...,M}.
Message estimate: W = g(Y n)
Memoryless channel model: (X , p(y|x),Y)
p(y1, ..., yn︸ ︷︷ ︸yn
|x1, ..., xn︸ ︷︷ ︸xn
) =
n∏i=1
p(yi|xi)
An (M,n) code: ({1, ...,M}, f, g)
Lecture 6 (Channels) Information and Communication Theory 2021 2/∞
Channels
Memoryless channel model: (X , p(y|x),Y)
Channel matrix: |X | × |Y|, with P = [Pi,j ] = p(Y = j|X = i).
Channel capacity:C = max
p(x)I(X;Y )
the maximum mutual information over all input distributions.
Example: noiseless binary channel (Y = X):
I(X;Y ) = H(Y ) = H(X).
C = maxp(x)
I(X;Y ) = maxp(x)
H(X) = 1 bit/symbol
A noiseless binary channel, can transmit 1 bit/symbol.Lecture 6 (Channels) Information and Communication Theory 2021 3/∞
Binary Symmetric Channel
Binary symmetric channel: (X , p(y|x),Y)
H(Y |X = x) = H(α, 1− α), for x = 0 or x = 1.
H(Y |X) = H(α, 1− α)
I(X;Y ) = H(Y )−H(α, 1− α)
Capacity: let P(X = 0) = β
C = maxβ
H(Y )−H(α, 1− α)
= 1−H(α, 1− α) bits/symbol
...achieved for β = 1/2.
For α = 0 or α = 1, C = 1; for α = 1/2, C = 0.
Lecture 6 (Channels) Information and Communication Theory 2021 4/∞
Binary Erasure Channel
Binary erasure channel: (X , p(y|x),Y)
H(X|Y = 1) = 0, for y = 0 or y = 1. H(X|Y = ∗) = 1
H(Y |X) = δ
I(X;Y ) = H(X)−H(X|Y ) = H(X)− δ
Capacity: let P(X = 0) = β
C = maxβ
H(X)− δ
= 1− δ bits/symbol
...achieved for β = 1/2.
For α = 0, C = 1; for α = 1, C = 0.
Lecture 6 (Channels) Information and Communication Theory 2021 5/∞
Noisy Typewriter
Noisy typewriter: (X , p(y|x),Y)
H(Y |X = x) = 1, for x = 1, 2, 3, 4. H(Y |X) = 1
I(X;Y ) = H(Y )−H(Y |X) = H(Y )− 1
Capacity:
C = maxp(x)
H(Y )− 1
= 1 bit/symbol
...achieved for p(x) uniform.
Lecture 6 (Channels) Information and Communication Theory 2021 6/∞
Properties of Channel Capacity
Because I(X;Y ) ≥ 0, then C ≥ 0
Because I(X;Y ) = H(X)−H(X|Y ) ≤ H(X),
C = maxp(x)
I(X,Y ) ≤ maxp(x)
H(X) ≤ log |X |
Because I(X;Y ) = H(Y )−H(Y |X) ≤ H(Y ),
C = maxp(x)
I(X,Y ) ≤ maxp(x)
H(Y ) ≤ log |Y|
Corollary: C ≤ min{log |X |, log |Y|}
Lecture 6 (Channels) Information and Communication Theory 2021 7/∞
Exercises
Compute the capacity of a series connection of two binary symmetricchannels. Hint: consider the equivalent BSC.
Consider the parallel of two independent channels (X1, p1(y|x),Y1)and (X2, p2(y|x),Y2), i.e., the channel
(X1 ×X2, p(y1|x1)p(y2|x2),Y1 × Y2).
what is the capacity of this channel?
Consider N channels with |X | = |Y| and non-maximal capacity, i.e.,C < log |X |, connected in series. Show that the capacity of theresulting channel converges to zero as N goes to infinity. Hint: usethe data processing inequality.
Lecture 6 (Channels) Information and Communication Theory 2021 8/∞
Exercises
Compute the capacity and the maximizing p(x) for the Z channel.
Consider a channel obtained by taking two conditional independentlooks at the output of a channel of capacity C, for each input: Y1and Y2. Show that the resulting capacity C ′ ≤ 2C. Hint: begin byshowing that I(X;Y1, Y2) = 2I(X;Y1)− I(Y1;Y2).
A symmetric channel is one in which every row of the channel matrixis a permutation of every other row and every column is a permutationof every other column. Show that in this case, the capacity is
C = log |Y| −H(any row of the channel matrix).
Show that the same result applies to weakly symmetric channels,where the columns are only required to sum to he same number.
Lecture 6 (Channels) Information and Communication Theory 2021 9/∞
Channel Coding
Conditional probability of error (for i ∈ {1, ...,M})
λi = P(g(Y n) 6= i|Xn = f(i)
)=∑yn∈Yn
P(Y n = yn|Xn = f(i)) 1g(yn)6=i
where 1A = 1, if A is true, and 1A = 0, if A is false.
Maximum probability of error: λ(n) = maxi∈{1,...,M}
λi
Average probability of error: P (n)e = 1
M
M∑i=1
λi
Probability of error: P(g(Y n) 6= i)
If the message symbols are equiprobable: P(g(Y n) 6= i) = P (n)e
Of course, P(n)e ≤ λ(n) and P(g(Y n) 6= i) ≤ λ(n)
Lecture 6 (Channels) Information and Communication Theory 2021 10/∞
Channel Coding
The rate of an (M,n) code
R =log2M
n
A rate R is achievable if there is a sequence of (d2nRe, n) codes, s.t.
limn→∞
λ(n) = 0
(Operational) capacity of a channel is
Coper = sup{R : R is achievable}
The channel coding theorem essentially states that
Coper = C = maxp(x)
I(X;Y )
Lecture 6 (Channels) Information and Communication Theory 2021 11/∞
Channel Coding: Examples
Consider a quarternary source M = 4, thus log2M = 2 bits
Using a binary noiseless channel, C = 1 bits/transmission, we needn = 2 transmissions to send each symbol:
R =log2M
n=
2
2= 1 bits/transmission
is this rate achievable? Yes, because the channel is noiseless.
What if C < 1? Rate 1 is no longer achievable!
Using a noiseless quarternary channel (|X | = |Y| = 4): only needn = 1 transmissions,
R =log2M
1= 2 bits/transmission
is this rate achievable? Yes, because the channel is noiseless: C = 2
Lecture 6 (Channels) Information and Communication Theory 2021 12/∞
Channel Coding: Example
Consider a binary symmetric channel with α = 1/5, thus C = 0.21
Thus, R = 0.25 is not achievable; R = 0.2 is achievable.
Examples of sequences (d2nRe, n), for R = 0.2:
(21, 5), (22, 10), (23, 15), ... e.g., use 10-bit codewords to send 2 bits
Examples of sequences (d2nRe, n), for R = 0.25:
(21, 4), (22, 8), (23, 12), ... e.g., use 8-bit codewords to send 2 bits
The channel coding theorem states that
X There is a sequence of (d2n 0.2e, n) codes, s.t. limn→∞ λ(n) = 0.
X For any sequence of (d2n 0.25e, n) codes, limn→∞ λ(n) 6= 0.
Lecture 6 (Channels) Information and Communication Theory 2021 13/∞
Channel Coding: Example
The noisy typewriter is a simple example of the theorem.
Capacity: C = 1 bit/transmission.
Input and output alphabets X = Y = {1, 2, 3, 4}.
In this case, R = C = 1 is achievable.
Codes (d2ne, n) have λ(n) = 0, for any n.
Encoder (for n = 1, thus M = 2n = 2): f(1) = 1; f(2) = 3.
Decoder: g(1) = g(2) = 1; g(3) = g(4) = 2.
Since λ(n) = 0, C = 1 is the zero-error capacity.
Lecture 6 (Channels) Information and Communication Theory 2021 14/∞
Asymptotic Equipartition: Motivation 1
Toss a fair coin 100 times: X1, ..., X100 ∈ {0, 1}.
...there are 2100 ' 1.27× 1030 possible outcomes,
...each with probability 2−100 ' 7.9× 10−30
The overwhelming majority has close to 50/50 heads/tails
How overwhelming? Let S = X1 + · · ·+X2,
P(S ∈ {47, ..., 53}) = 2−10053∑j=47
(100j
)' 0.52
How many sequences are in this set?
|{
(x1, ..., x100) : S ∈ {47, ..., 53}}| =
53∑j=47
(100j
)' 6.54× 1029
...fraction of the total: ' 6.54× 1029/2100 ' 0.52
Lecture 6 (Channels) Information and Communication Theory 2021 15/∞
Asymptotic Equipartition: Motivation 2
Unfair coin (P(heads) = P(Xi = 1) = 0.9) 100 tosses: X1, ..., X100.
...there are 2100 ' 1030 possible outcomes,
The overwhelming majority has close to 90/10 heads/tails
How overwhelming? Let S = X1 + · · ·+X2,
P(S ∈ {87, ..., 93}) =
93∑j=87
0.9 j 0.1100−j(
100j
)' 0.76
How many sequences are in this set?
|{
(x1, ..., x100) : S ∈ {87, ..., 93}}| =
93∑j=87
(100j
)' 8.3× 1015
...fraction of the total: ' 8.3× 1015/2100 ' 6.5× 10−15
Lecture 6 (Channels) Information and Communication Theory 2021 16/∞
Asymptotic Equipartition: Law of Large Numbers
Consider X1,...,Xn i.i.d. with E[Xi] = µ.
Weak law of large numbers (WLLN) (Bernoulli, 1713)
limn→∞
1
n
(X1 + · · ·+Xn
)= µ, (in probability)
Applying to log p(X1, ..., Xn),
− 1
nlog p(X1, ..., Xn) = − 1
n
n∑i=1
log p(Xi) −→n→∞
E[− log p(X)] = H(X)
This convergence is in probability: for any ε > 0,
limn→∞
P[∣∣∣∣− log p(X1, ..., Xn)
n−H(X)
∣∣∣∣ < ε
]= 1
Lecture 6 (Channels) Information and Communication Theory 2021 17/∞
Asymptotic Equipartition Property (AEP)
Definition: for n i.i.d. samples x1, ..., xn of X ∈ X , the set of
ε-typical sequences A(n)ε (called typical set) is
A(n)ε = {(x1, ..., xn) :
∣∣− 1
nlog p(x1, ..., xn)−H(X)
∣∣ ≤ ε}The condition can also be written as
2−n(H(X)+ε) ≤ p(x1, ..., xn) ≤ 2−n(H(X)−ε)
AEP theorem: for any ε > 0 and n sufficiently large,
P[(x1, ..., xn) ∈ A(n)
ε
]≥ 1− ε
(1− ε)2n(H(X)−ε) ≤ |A(n)ε | ≤ 2n(H(X)+ε)
Lecture 6 (Channels) Information and Communication Theory 2021 18/∞
AEP Corollary
AEP theorem: for any ε > 0 and n sufficiently large,
P[(x1, ..., xn) ∈ A(n)
ε
]≥ 1− ε
(1− ε)2n(H(X)−ε) ≤ |A(n)ε | ≤ 2n(H(X)+ε)
Corollary: if H(X) < log |X |, for a sufficiently small ε,
limn→∞
|A(n)ε ||X n|
≤ limn→∞
2n(H(X)+ε−log |X |) = 0,
if ε < log |X | −H(X).
Typical set: vanishingly small volume with arbitrarily high probability.
Lecture 6 (Channels) Information and Communication Theory 2021 19/∞
AEP: Example 1
Toss a fair coin n times: X1, ..., Xn ∈ {0, 1}; H(X) = 1
...there are 2n possible outcomes,
...each with probability p(x1, ..., xn) = 2−n
With ε = 0.02,
A(n)0.02 = {(x1, ..., xn) : 2−n(1.02) ≤ p(x1, ..., xn) ≤ 2−n(0.98)} = {0, 1}n
0.98 2n(0.98) ≤ |A(n)0.02| ≤ 2n(1.02) (|A(n)
0.02| = 2n)
1 = P[(x1, ..., xn) ∈ A(n)
0.02
]≥ 0.98
For maximum entropy (H(X) = 1), the AEP is uninformative.
Lecture 6 (Channels) Information and Communication Theory 2021 20/∞
AEP: Example 2
Toss a unfair coin (probability of heads 0.8) n times: X1, ..., Xn.
Entropy: H(X) ' 0.72
With ε = 0.02,
A(n)0.02 = {(x1, ..., xn) : 2−n(0.74) ≤ p(x1, ..., xn) ≤ 2−n(0.70)}
|A(n)0.02| ≤ 2n 0.74
P[(x1, ..., xn) ∈ A(n)
0.02
]≥ 0.98, for n large enough
For non-maximum entropy, the AEP is very informative:
|A(n)0.02|
|{0, 1}n|≤ 2−n 0.26 (e.g., for n = 100, 2−26 ' 10−8 )
Lecture 6 (Channels) Information and Communication Theory 2021 21/∞
Interlude: AEP and Source Coding
Source X ∈ X ; order n extension of : Xn = (X1, ..., Xn) ∈ X n.
Coding method: given a sequence (x1, ...., xn),
X if (x1, ...., xn) ∈ A(n)ε code it using dn(H(X) + ε)e bits.
...enough, because #A(n)ε ≤ 2n(H(X)+ε)
X if (x1, ...., xn) 6∈ A(n)ε , code it using dlog |Xn|e = dn log |X |e bits.
...enough, because |A(n)ε | ≤ |Xn| = |X |n
X to distinguish the two cases, use a 1-bit prefix.
Length of this coding scheme:
lC(x1, .., xn) =
{1 + dn(H(X) + ε)e if (x1, ...., xn) ∈ A(n)
ε
1 + dn log |X |e if (x1, ...., xn) 6∈ A(n)ε
Lecture 6 (Channels) Information and Communication Theory 2021 22/∞
Interlude: AEP and Source Coding
Length of the coding scheme in the previous slide:
lC(x1, .., xn) <
{2 + n(H(X) + ε) if (x1, ...., xn) ∈ A(n)
ε
2 + n log |X | if (x1, ...., xn) 6∈ A(n)ε
Expected length L[C] = E[lC(X1, ..., Xn)] (in bits/(n symbols)), for0 < ε� 1 and sufficiently large n,
L[C] < P[A(n)ε ]︸ ︷︷ ︸≤1
(2 + n(H(X) + ε)) +(1− P[A(n)
ε ])︸ ︷︷ ︸
≤ε
(2 + n log |X |)
≤ 2 + (n(H(X) + ε)) + ε(n log |X |)= 2 + n
(H(X) + ε+ ε log |X |
)Normalize to bits/symbol, dividing by n:
L[C]
n≤
2 + n(H(X) + ε+ ε log |X |
)n
,
...that is, L[C]/n can be arbitrarily close to H(X).
Lecture 6 (Channels) Information and Communication Theory 2021 23/∞
Channel Coding TheoremConsider a discrete memoryless channel with capacity C. Then,
1) Any R < C is achievable: there exist sequences of (d2nRe, n) codessuch that lim
n→∞λ(n) = 0.
2) Any sequence of (d2nRe, n) codes with limn→∞
λ(n) = 0, must have
R ≤ C
Intuition: for large n, every channel looks like a noisy typewriter.
Lecture 6 (Channels) Information and Communication Theory 2021 24/∞
Channel Coding Theorem: Overview of the Proof
Take p∗(x) = arg maxp(x)
I(X;Y ), and p(xn) =∏ni=1 p
∗(xi)
For every xn, consider H(Y n|Xn = xn); conditional typical set
A(n)ε (xn) = {yn :
∣∣− 1
nlog p(yn|xn)−H(Y |X)
∣∣ ≤ ε}For arbitrarily small ε and large n, AEP states that
|A(n)ε (xn)| ' 2nH(Y |X)
The unconditional typical set is
A(n)ε = {yn :
∣∣− 1
nlog p(yn)−H(Y )
∣∣ ≤ ε}For arbitrarily small ε and large n, AEP states that
|A(n)ε | ' 2nH(Y )
Lecture 6 (Channels) Information and Communication Theory 2021 25/∞
Channel Coding Theorem: Overview of the Proof
To have (asymptotically as n→∞) error-free communication:
X Different A(n)ε (xn) must be disjoint:
X All A(n)ε (xn) must be inside A
(n)ε
The maximum number of words that we can have is thus,
M = 2nR ≤ |A(n)ε |
|A(n)ε (xn)|
' 2n(H(Y )−H(Y |X)) = 2nI(X;Y ) ≤ 2nC
...which leads to R < C.
This was not a rigorous proof; if you’re interested in the details, seethe recommended reading.
Lecture 6 (Channels) Information and Communication Theory 2021 26/∞
Repetition Codes
Unlike for source coding (where we have Huffman codes), buildingcapacity-approaching codes is harder.
Simplest code: repetition; e.g., (d2n/3e, n) codes, rate R = 1/3.
For n = 3, we have (2, 3)-codes, thus M = 2 words, W ∈ {0, 1},
encoder f(0) = 000, f(1) = 111
decoder g(y3) = arg mini∈{0,1}
dH(y3, f(i)),
where dH is the Hamming distance (number of bits in which thewords differ): minimum distance decoding.
For higher n, we have (22, 6) codes (M = 4), ..., (25, 15) codes(M = 32), ...
Lecture 6 (Channels) Information and Communication Theory 2021 27/∞
Error Correction and Error Detection
A binary encoder f : {1, ...,M} → {0, 1}n defines a set of codewords:
{f(1), f(2), ..., f(M)}.
Minimum distance decoding of received word yn:
g(yn) = arg mini∈{1,...,M}
dH(yn, f(i))
Minimum distance of the code:
dmin = mini 6=j
dH(f(i), f(j))
Error correction: a code corrects up todmin − 1
2errors.
Error detection: a code detects up to dmin − 1 errors.
Exercise: show that a repetition code corrects up to 1−R2R errors and
detects up to (1−R)/R errors.
Lecture 6 (Channels) Information and Communication Theory 2021 28/∞
Hamming Codes
Binary linear codes are built on binary linear algebra.
Before proceeding, we need need binary arithmetic: ({0, 1},+,×)
X addition: 0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1, 1 + 1 = 0.
X multiplication: 0× 0 = 0, 0× 1 = 0, 1× 0 = 0, 1× 1 = 1.
X both are clearly commutative a+ b = b+ a and a× b = b× a.
X also associative: a+ (b+ c) = b+ (a+ c) and (a× b)× c = a× (a× c).
X distributive property: a× (b+ c) = a× b+ a× c.
In binary arithmetic, a+ b = a− b.
Based on binary arithmetic, we may build binary linear algebra, withbinary vectors and matrices.
Can be extended to other Galois fields GF (q); e.g., GF (q) ternaryarithmetic, with modulo-3 addition and multiplication.
Lecture 6 (Channels) Information and Communication Theory 2021 29/∞
Hamming Codes
Generalizes the idea of parity check for error detection/correction.
Hamming(n, k) code is (in the previous notation) a (2k, n) code.
Rate of a Hamming(n, k) code: R = k/n.
Classical example: Hamming(7, 4) generator matrix:
G =
1 0 0 0 1 1 00 1 0 0 1 0 10 0 1 0 0 1 10 0 0 1 1 1 1
=[I4|A
]
Generation of codeword x from message: example m = (1101):
x = mG = (1101)G = (1101100)
where the vector-matrix product is in binary arithmetic.
Lecture 6 (Channels) Information and Communication Theory 2021 30/∞
Hamming CodesGeneration of codeword x from message: example m = (1101):
x = mG = (1101)G = (1101100)
Checking codewords:parity-check matrix H such that
HGT = 0 ⇒ HxT = H(mG)T = HGTmT = 0
For G =[I4|A
], then H =
[AT |I3
]HGT =
[AT |I3
] [ I4AT
]= AT + AT = 0
For matrix G in the previous slide
H =[AT |I3
]=
1 1 0 1 1 0 01 0 1 1 0 1 00 1 1 1 0 0 1
...the columns are the (23 − 1 = 7) 3-bit binary words, except (000).
Lecture 6 (Channels) Information and Communication Theory 2021 31/∞
Hamming Codes
Let x + e be a received codeword, with error vector e.
Checking: H(x + e)T = HxT︸ ︷︷ ︸0
+HeT = HeT
No errors detected if and only if HeT = 0. Conditions:
X Zero errors, HeT = 0.
X One error, HeT 6= 0; it is one of the columns of H.
X Two errors, HeT 6= 0; it is the sum of two columns of H, which are alldifferent.
Any two errors are detected, but three errors may be undetected,since the sum of any two columns equals another column.
Exercise: show that for a Hamming(7, 4) code, dmin = 3, thus itcorrects 1 error and detects up to 2 errors.
Lecture 6 (Channels) Information and Communication Theory 2021 32/∞
Hamming Codes
Minimum distance of Hamming(7, 4) code is 3.
Thus it can correct 1 error; how?
Permute the columns of H (and similarly of G) into
H =
0 0 0 1 1 1 10 1 1 0 0 1 11 0 1 0 1 0 1
Check x + e, assuming only one error in position, say 5,
(H(xT + eT ))T = eHT = (0000100)HT = (101)
...precisely the binary word for 5, the error position.
Lecture 6 (Channels) Information and Communication Theory 2021 33/∞
Hamming CodesGeneral Hamming(n, k) codes.
For some r ≥ 2: n = 2r − 1 and k = 2r − r − 1.
The Hamming(7, 4) code: r = 3, n = 23− 1 = 7, k = 23− 3− 1 = 4.
Columns H: all n = 2r − 1 binary words of r bits, except zero.
Put H is systematic form H =[AT |I3
]and build G =
[I4|A
].
Rate: R = k/n = (2r − r − 1)/(2r − 1)
Exercise: show that, for any r, dmin = 3.
Remarkably, limr→∞
2r − r − 1
2r − 1= 1
The repetition code also has minimum distance 3, but R = 1/3.
Error-correcting codes are a huge R&D area, without which moderncommunications would not be possible.Lecture 6 (Channels) Information and Communication Theory 2021 34/∞
Exercises
Show that the repetition code of R = 1/3 is a Hamming(n, k) code.Find r, n, k, and the matrices H and G.
Consider a Hamming(7, 4) code in systematic form. Decode the word(1011011).
A Hamming code is a particular case of the more general family oflinear codes, i.e., where the code words are generated as x = mG.Show that for any binary linear code,
a) the zero word is a valid codeword;b) dmin is the weight (number of 1s) is the minimum-weight code word.
Assuming a Hamming(7, 4) code is used on a BSC with probability oferror α, what is the probability of an erroneous decoding?
Lecture 6 (Channels) Information and Communication Theory 2021 35/∞
Gaussian Channel
Gaussian channel: X = Y = R, Y = X + Z.
Mutual information
I(X;Y ) = h(Y )− h(Y |X)
= h(Y )− h(X + Z|X)
= h(Y )− h(Z)
= h(Y )− 1
2log(2πeN
)since differential entropy is shift-invariant, and Z is Gaussian andindependent of X.
Since adding a constant to X does not affect I(X;Y ), assumeE[X] = 0, thus var[X] = E[X2] = power.
Lecture 6 (Channels) Information and Communication Theory 2021 36/∞
Gaussian Channel
Gaussian channel: X = Y = R, Y = X + Z.
Mutual information,
I(X;Y ) = h(Y )− 1
2log(2πeN
)≤ 1
2log(2πe(N + E[X2])
)− 1
2log(2πeN
)=
1
2log(
1 +E[X2]
N
)Without a constraint on E[X2], I(X;Y ) is arbitrarily large.
With a power constrain E[X2] ≤ P ,
C = maxfX :E[X2]≤P
1
2log(
1 +E[X2]
N
)=
1
2log(
1 +P
N
)achieved for fX = N (0, P ). P/N = SNR, signal to noise ratio.
Lecture 6 (Channels) Information and Communication Theory 2021 37/∞
Coding for a Gaussian Channel
An (M,n) code for a Gaussian channel, under power constraint P .
X a set of message indices W ∈ {1, ...,M};X an encoder f : {1, ...,M} → Rn, i.e., f(i) = [f1(i), ..., fn(i))] ∈ Rn,
‖f(i)‖2 =
n∑j=1
fj(i)2 ≤ nP.
X a decoder g : Rn → {1, ...,M}.
Conditional, average, and maximum probability of error, λ(n), aredefined as in the discrete channel.
Rate R is achievable if there exists a sequence of (2nR, n) codessatisfying the power constraint P , such that limn→∞ λ
(n) = 0.
The (operational) capacity is: Coper = sup{R : R is achievable}.
The Gaussian channel theorem: Coper = C.
Lecture 6 (Channels) Information and Communication Theory 2021 38/∞
Coding for a Gaussian Channel
Outline of the proof of the Gaussian channel theorem.
We known that Y n = Xn + Zn, that E[‖Xn‖2] ≤ nP , thusE[‖Zn‖2] ' n(P +N).
All the received vectors are, with high probability (w.h.p.), in a sphereof radius
√n(N + P ).
Each received vector is, w.h.p., in a sphere around f(i) of radius√nN .
The volume of a radius-r sphere is V (r) = Cnrn.
The maximum number of (asymptotically) non-intersecting spheres is
M = 2nR ≤ (n(N + P ))n/2
(nN)n/2= 2
n2log(
P+NN
)= 2
n2log(1+ P
N
)...thus R < C.
Lecture 6 (Channels) Information and Communication Theory 2021 39/∞
Sphere Packing
Thus picture becomes accurate for large n, since ”in high dimensions,Gaussian distributions are soap bubbles.”1
1www.inference.vc/high-dimensional-gaussian-distributions-are-soap-bubble/
Lecture 6 (Channels) Information and Communication Theory 2021 40/∞
Exercises
Consider the multi-path channel where the noises Z1 an Z2 follow aGaussian joint probability density function with zero mean and covariance K
where σ2 is the noise variance and ρ the correlation coefficient. Find thecapacity of the channel. What is the capacity for ρ = 1, ρ = 0, ρ = −1;interpret the results.
Continuous channel with discrete input: consider a channel with inputX ∈ {0, 1} and ouput Y = X + Z, where Z ∈ [0, a] with uniform density.Assuming a > 1, find the capacity of the channel. Repeat for a < 1 andinterpret the result.
Lecture 6 (Channels) Information and Communication Theory 2021 41/∞
Recommended Reading
T. Cover and J. Thomas, “Elements of Information Theory”, JohnWiley & Sons, 2006 (Sections 7.1 to 7.6, 7.11, 9.1).
https://en.wikipedia.org/wiki/Hamming_code
Lecture 6 (Channels) Information and Communication Theory 2021 42/∞