lecture 7 lossy source coding -...
TRANSCRIPT
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lecture 7Lossy Source Coding
I-Hsiang Wang
Department of Electrical EngineeringNational Taiwan University
December 2, 2015
1 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
The Block-to-Block Source Coding Problem
Source Encoder
Source DecoderSource Destination
s[1 : N ] b[1 : K] bs[1 : N ]
Recall: in Lecture 03, we investigated the fundamental limit of (almost)lossless block-to-block (or fixed-to-fixed) source coding.
The recovery criterion is vanishing probability of error :
limN→∞
P{
SN = SN}= 0.
The minimum compression ratio to fulfill lossless reconstruction isthe entropy rate of the source:
R∗ = H ({Si} ) , for stationary and ergodic {Si}.
2 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
The Block-to-Block Source Coding Problem
Source Encoder
Source DecoderSource Destination
s[1 : N ] b[1 : K] bs[1 : N ]
In this lecture, we turn our focus to lossy block-to-block source coding,where the setting is the same as before, except
The recovery criterion is reconstruction to with a given distortion D :
lim supN→∞
E[d(
SN, SN)]
≤ D.
The minimum compression ratio to fulfill reconstruction to within agiven distortion D is the rate-distortion function:
R(D) = minpS|S: E[d(S,S)]≤D
I(
S ; S), for DMS {Si}.
3 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Why lossy source coding?
Sometimes it might be too expensive to reconstruct the source in alossless way.
Sometimes it is impossible to reconstruct the source losslessly.For example, if the source is continuous-valued, the entropy rate ofthe source is usually infinite!
Lossy source coding has wide range of applications, includingquantization/digitization of continuous-valued signals,image/video/audio compression, etc.
In this lecture, we first focus on discrete memoryless sources (DMS).Then, we employ the discretization technique to extend the codingtheorems from the discrete-source case to the continuous-source case. Inparticular, Gaussian sources will be our main focus.
4 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossless vs. Lossy Source Coding
The general lossy source coding problem involves quantizing all possiblesource sequences sN ∈ SN into 2K reconstruction sequences sN ∈ SN,which can be represented by K bits.The goal is to design the correspondence between sN and sN so that thedistortion (quantization error) is below a prescribed level D.
Lossy source coding has a couple of notable differences from losslesssource coding:
Source alphabet S and the reconstruction alphabet S could bedifferent in general.Performance is determined by the chosen distortion measure.
5 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossy Source Coding TheoremRate Distortion Function
1 Lossy Source Coding Theorem for Memoryless SourcesLossy Source Coding TheoremRate Distortion Function
2 Proof of the Coding TheoremConverse ProofAchievability
6 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossy Source Coding TheoremRate Distortion Function
1 Lossy Source Coding Theorem for Memoryless SourcesLossy Source Coding TheoremRate Distortion Function
2 Proof of the Coding TheoremConverse ProofAchievability
7 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossy Source Coding TheoremRate Distortion Function
Distortion Measures
We begin with the definition of the distortion measure per symbol.
Definition 1 (Distortion Measure)A per-symbol distortion measure is a mapping d (s, s) that maps fromS × S to [0,∞), and it is understood as the cost of representing s by s.For two length N sequences sN and sN, the distortion between them isdefined as the average of the per-symbol distortion:
d(sN, sN) ≜ 1
N∑N
i=1 d (si, si) .
Examples: below are two widely used distortion measures:Hamming distortion: S = S, d (s, s) ≜ 1 {s = s}.
Squared-error distortion: S = S = R, d (s, s) ≜ (s − s)2.
8 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossy Source Coding TheoremRate Distortion Function
Lossy Source Coding: Problem Setup
Source Encoder
Source DecoderSource Destination
s[1 : N ] b[1 : K] bs[1 : N ]
1 A(2NR,N
)source code consists of
an encoding function (encoder) encN : SN → {0, 1}K that mapseach source sequence sN to a bit sequence bK, where K ≜ ⌊NR⌋.a decoding function (decoder) decN : {0, 1}K → SN that maps eachbit sequence bK to a reconstructed source sequence sN.
2 The expected distortion of the code D(N) ≜ E[d(
SN, SN)]
.
3 A rate-distortion pair (R,D) is said to be achievable if there exist asequence of
(2NR,N
)codes such that lim sup
N→∞D(N) ≤ D.
The optimal compression rate R(D) ≜ inf {R | (R,D) : achievable}.
9 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossy Source Coding TheoremRate Distortion Function
Rate Distortion Trade-off
Dmin ≜ mins(s)
E [d (S, s (S))]
It denotes the minimum possible targetdistortion so that the rate is finite.Even the decoder knows the entire sN
and finds a best representative sN (sN),
the expected distortion is still Dmin.
Dmax ≜ mins
E [d (S, s)] DmaxDmin
R
D
Achievable
Not Achievable
H (S )
R (D)R (Dmin)
Let s∗ ≜ arg mins
E [d (S, s)]. Then for target distortion D ≥ Dmax, we can use a
single representative s∗ ≜ (s∗, s∗, . . . , s∗) to reconstruct all sN ∈ SN (rate is 0!), and
D(N) = E[d(SN, s∗
)]= 1
N∑N
i=1 E [d (Si, s∗)] = Dmax ≤ D.
Hence, R (D) = 0 for all D ≥ Dmax.10 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossy Source Coding TheoremRate Distortion Function
Lossy Source Coding Theorem
Source Encoder
Source DecoderSource Destination
s[1 : N ] b[1 : K] bs[1 : N ]
Theorem 1 (A Lossy Source Coding Theorem for DMS)For a discrete memoryless source {Si | i ∈ N},
R(D) = minpS|S: E[d(S,S)]≤D
I(
S ; S). (1)
Interpretation:
H (S ) − H(
S∣∣∣S)
= I(
S ; S)
Uncertaintyof source S − Uncertainty of S
after learning S = The rate used incompressing S to S
11 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossy Source Coding TheoremRate Distortion Function
1 Lossy Source Coding Theorem for Memoryless SourcesLossy Source Coding TheoremRate Distortion Function
2 Proof of the Coding TheoremConverse ProofAchievability
12 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossy Source Coding TheoremRate Distortion Function
Properties of Rate Distortion Function
DmaxDmin
R
D
Achievable
Not Achievable
H (S )
R (D)R (Dmin)
A rate distortion function R (D)satisfies the following properties:
1 Nonnegative2 Non-increasing in D.3 Convex in D.4 Continuous in D.5 R (Dmin) ≤ H (S ).6 R (D) = 0 if D ≥ Dmax.
These properties are all quite intuitive.Below we sketch the proof of these properties.
13 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossy Source Coding TheoremRate Distortion Function
Monotonicity Clear from the definition.
Convexity
The goal is to prove that D1,D2 ≥ Dmin and λ ∈ (0, 1), λ ≜ 1− λ,
R(λD1 + λD2
)≤ λR (D1) + λR (D2) .
Let pi (s|s) ≜ arg minpS|S: E[d(S,S)]≤Di
I(
S ; S)
, the optimizing conditional
distribution that achieves distortion Di, for i = 1, 2. Let pλ ≜ λp1 + λp2.Under pλ (s|s), the expected distortion between S and S ≤ λD1 + λD2,
∵ Ep(S)pλ (s|s)
[d(
S, S)]
=∑
s
∑s
p (s)[λp1 (s|s) + λp2 (s|s)
]d (s, s) .
Proof is complete since I(
S ; S)
is convex in pS|S with a fixed pS:
R(λD1 + λD2
)≤ I
(S ; S
)pλ
≤ λI(
S ; S)
p1
+ λI(
S ; S)
p2
= λR (D1) + λR (D2) .
14 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossy Source Coding TheoremRate Distortion Function
Nonnegativity Clear from the definition.
Continuity It is well-known that convexity within an open interval impliescontinuity within that open interval.
DmaxDmin
R
D
violates convexity
DmaxDmin
R
D
discontinuity is possible here
violates convexity
Hence, the only point where R (D) might be discontinuous is at the boundaryD = Dmin. The proof is technical and can be found in Gallager[2].
15 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossy Source Coding TheoremRate Distortion Function
Example: Bernoulli Source with Hamming Distortion
Source (binary) Si ∈ S = {0, 1}, and Sii.i.d.∼ Ber (p) ∀ i.
Distortion (Hamming) d (s, s) = 1 {s = s}.
Example 1Derive the rate distortion function of the Bernoulli p source withHamming distortion and show that it is given by
R (D) =
{Hb (p)− Hb (D) , 0 ≤ D ≤ min (p, 1− p)
0, D > min (p, 1− p).
This is the first example about how to compute the rate distortionfunction, that is, how to solve (1) in the lossy source coding theorem.
16 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossy Source Coding TheoremRate Distortion Function
sol. The first step is to identify Dmin and Dmax.
Dmin = 0 because one can choose s (s) = s.
Dmax = min (p, 1− p) because one can choose s =
{0 p ≤ 1
2
1 p ≥ 12
.
The next step is to lower bound I(
S ; S)= H (S )− H
(S∣∣∣S)
.
It is equivalent to upper bounding H(
S∣∣∣S)
:
H(
S∣∣∣S)
= H(
S ⊕ S∣∣∣S)
≤ H(
S ⊕ S)= Hb (q) ,
where we assume that S ⊕ S ∼ Ber (q) for some q ∈ [0, 1].
Observe that d(
S, S)≡ S ⊕ S. Hence, E
[d(
S, S)]
≤ D =⇒ q ≤ D.
Since D ≤ Dmax ≤ 12, we see that Hb (q) is maximized when q = D.
Hence, I(
S ; S)≥ Hb (p)− Hb (D).
17 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossy Source Coding TheoremRate Distortion Function
Final step: show that the lower bound Hb (p)− Hb (D) can be attained.The goal is to find a probability transition matrix p (s|s) such that
S ⊥⊥ S ⊕ S so that H(
S ⊕ S∣∣∣S )
= H(
S ⊕ S)
and P{
S ⊕ S = 1}= D.
At first glance this looks hard.The difficulty can be resolved via an auxiliary reverse channel.
Consider a channel with input S, output S, additive noise Z ∼ Ber (D)⊥⊥ S.
S = S ⊕ Z =⇒ Z = S ⊕ S.
The reverse channel specifies the joint distribution p (s, s) and hence p (s|s)!
0
1
�S S
0
1
D
1 � D
� p
1 � � 1 � p
D
1 � D
p = (1 � �)D + �(1 � D) =� � =p � D
1 � 2D
18 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossy Source Coding TheoremRate Distortion Function
Example: Gaussian Source with Squared Error Distortion
Source (Gaussian) Si ∈ S = R, and Sii.i.d.∼ N
(µ, σ2
)∀ i.
Distortion (Squared Error) d (s, s) = |s − s|2.
Example 2Derive the rate distortion function of the Gaussian source with squarederror distortion and show that it is given by
R (D) =
12 log
(σ2
D
), 0 ≤ D ≤ σ2
0, D > σ2.
Remark: Although the source is continuous, one can use weak typicalityor the discretization method used in channel coding to extend the lossysource coding theorem from discrete memoryless sources to continuous.Note: In particular, note that R (0) = ∞, which is quite intuitive!
19 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossy Source Coding TheoremRate Distortion Function
sol. First step: identify Dmin and Dmax.
Dmin = 0 because one can choose s (s) = s.Dmax = σ2 because one can choose s = µ, the mean of S.
Next step: lower bound I(
S ; S)= h (S )− h
(S∣∣∣S)
.
It is equivalent to upper bounding h(
S∣∣∣S)
:
h(
S∣∣∣S)
= h(
S − S∣∣∣S)
≤ h(
S − S)≤ 1
2log (2πe D) ,
where the last inequality holds since Var[S − S
]≤ E
[∣∣∣S − S∣∣∣2] ≤ D.
Hence, I(
S ; S)≥ 1
2 log(2πeσ2
)− 1
2 log (2πe D) = 12 log
(σ2
D
).
20 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossy Source Coding TheoremRate Distortion Function
Final step: show that the lower bound 12 log
(σ2
D
)can be attained.
The goal is to find a conditional distribution p (s|s) such that
S⊥⊥(
S − S)
so that h(
S − S∣∣∣S )
= h(
S − S)
and(
S − S)∼ N (0,D) .
Again, this can be done via an auxiliary reverse channel.Consider a channel with input S, output S, additive noise Z ∼ N (0,D)⊥⊥S.
S = S + Z =⇒ Z = S − S.
The reverse channel specifies the joint distribution p (s, s) and hence p (s|s)!
X � N�µ,�2
��X
Z � N (0, D)
�X � N�µ, �2 � D
���
�X � �X
�
21 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Lossy Source Coding TheoremRate Distortion Function
Example: Source Alphabet = Reconstruction Alphabet
Source (ternary) Si ∈ S = {0, ∗, 1}, and Sii.i.d.∼ pS ∀ i, where
pS (0) = pS (1) = ε ≤ 12 .
Reconstruction (binary) S = {0, 1}.
Distortion d (s, s) ={1 if s = ∗ and s = s0 if s = ∗ or s
.
In other words, there is a don’t-care symbol ∗, and S = S.
Example 3(HW5) Derive the rate distortion function and show that it is given by
R (D) =
{2ε
(1− Hb
( D2ε
)), 0 ≤ D ≤ ε
0, D > ε.
22 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Converse ProofAchievability
1 Lossy Source Coding Theorem for Memoryless SourcesLossy Source Coding TheoremRate Distortion Function
2 Proof of the Coding TheoremConverse ProofAchievability
23 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Converse ProofAchievability
1 Lossy Source Coding Theorem for Memoryless SourcesLossy Source Coding TheoremRate Distortion Function
2 Proof of the Coding TheoremConverse ProofAchievability
24 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Converse ProofAchievability
Proof of the Converse of Theorem 1
We aim to show that for any sequence of(2NR,N
)source codes with
lim supN→∞
D(N) ≤ D, the rate R must satisfy R ≥ R(D) (defined in (1)).
We begin with similar steps as in lossless source coding (cf. Lecture 03).
pf: Note that BK is a r.v. because it is generated by another r.v, SN.
K = NR ≥ H(BK )
≥ I(
BK ; SN) (a)≥ I
(SN ; SN
)(b)=
N∑i=1
I(
Si ; SN∣∣∣Si−1
)(c)=
N∑i=1
I(
Si ; SN,Si−1)
≥N∑
i=1
I(
Si ; Si
)(a) is due to SN − BK − SN and the data processing inequality.(b) is due to Chain Rule. (c) is due to Si ⊥⊥ Si−1 (memoryless source).
So far, we have not yet used the condition on distortion.
25 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Converse ProofAchievability
Further working on the inequality:
NR ≥∑N
i=1 I(
Si ; Si
)(d)≥
∑Ni=1 R
(E[d(
Si, Si
)])= N
∑Ni=1
1N R
(E[d(
Si, Si
)])(e)≥ NR
(∑Ni=1
1NE
[d(
Si, Si
)])= NR
(E[1N∑N
i=1 d(
Si, Si
)])= NR
(E[d(
SN, SN)])
= NR(D(N)
).
(d) is due to the definition of R (D) in (1).(e) is due to the convexity of R (D) and Jensen’s inequality.
Hence, R ≥ lim supN→∞
R(D(N)
) (f)≥ R
(lim sup
N→∞D(N)
)(g)≥ R (D).
(f) is due to continuity of R (D).(g) is due to lim sup
N→∞D(N) ≤ D and R (D) is non-increasing.
26 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Converse ProofAchievability
Remarks
You might note that in the previous proof of converse, we do not makeuse of lower bounds on error probability such as Fano’s inequality.This is because that in our formulation of the lossy source codingproblem, the reconstruction criterion is laid on the expected distortion.
Instead of the criterion lim supN→∞
D(N) ≤ D where D(N) ≜ E[d(
SN, SN)]
,we could use a stronger criterion as follows:
P(N,δ)e ≜ P
{d(
SN, SN)> D + δ
}, δ > 0 (Probability of Error)
limN→∞
P(N,δ)e = 0, ∀ δ > 0 (Reconstruction Criterion)
Under this stronger criterion, we can then give a new operationaldefinition of the rate distortion function.It turns out Theorem 1 remains the same! (converse is implied by our converse)
27 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Converse ProofAchievability
1 Lossy Source Coding Theorem for Memoryless SourcesLossy Source Coding TheoremRate Distortion Function
2 Proof of the Coding TheoremConverse ProofAchievability
28 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Converse ProofAchievability
Idea of Constructing Good Source Code
Key in source coding:1 Find a good set of representatives (quantization codewords).2 For each source sequence, determine which codeword to be used.
Main tools we use so far in developing achievability of coding theorems:1 Random coding: construct the codebook randomly and show that
at least one realization can achieve the desired target performance.2 Typicality: help give bounds in performance analysis.
In the following, we prove the achievability part of Theorem 1 by
1 Random coding – show existence of good quantization codebook.
2 Typicality encoding – determine which codeword to be used.
29 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Converse ProofAchievability
SN�SN Random Coding
Typicality Encoding
30 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Converse ProofAchievability
Proof Program
1 Random Codebook Generation:Generate a random ensemble of quantization codebooks, each ofwhich contains 2K codewords.
2 Analysis of Expected Distortion:Goal: Show that lim sup
N→∞EC,SN
[d(
SN, SN)]
≤ D, and concludethat there must exist a codebook c such that the expected distortionsatisfies lim sup
N→∞ESN
[d(
SN, SN)]
≤ D.
Note that for a source sequence sN, the optimal encoder chooses anindex w ∈
[1 : 2K]
, that is, a codeword sN (w) in the codebook, sothat d
(sN, sN (w)
)is minimized.
However, similar to ML decoding in channel coding, such optimalencoder is hard to analyze. To simplify analysis, we shall introduce asuboptimal encoder based on typicality.
31 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Converse ProofAchievability
Random Codebook Generation
Fix the conditional p.m.f. that attains R(
D1+ε
):
qS|S = arg minpS|S: E[d(S,S)]≤ D
1+ε
I(
S ; S)
(2)
Based on the chosen qS|S and the source distribution pS, calculate pS,the marginal distribution of the reconstruction S.Generate 2K codewords
{sN (w) | w = 1, 2, . . . , 2K}
, i.i.d. according top(sN) = ∏N
i=1 pS (si).
In other words, if we think of the quantization codebook as a 2K × Nmatrix C, the elements of C will be i.i.d. distributed according to pS.
Remark: observe the resemblance with the channel coding achievability.
32 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Converse ProofAchievability
Encoding and Decoding
Encoding: unlike channel coding, the encoding process in source codingproblem is usually much involved.We use typicality encoding: (resembling typicality decoding in channel coding)
Given a source sequence sN, find an index w ∈[1 : 2K]
such that(sN, sN (w)
)∈ T (N)
ε
(pS,S
).
Recall the joint distribution pS,S = pS × qS|S as defined in (2).If there is no or more than one such index, randomly pick onew ∈
[1 : 2K]
.Send out the bit sequence that represent the chosen w.
Decoding: Upon receiving the bit sequence representing w, generate thereconstructed sN (w) by looking up the quantization codebook.
33 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Converse ProofAchievability
Analysis of Expected Distortion
Why typicality encoder? Typical average lemma (Lemma 2, Lecture 04):
For any nonnegative function g (x) on X , if xn ∈ T (n)ε (X), then
(1− ε)E [g (X)] ≤ 1n
n∑i=1
g (xi) ≤ (1 + ε)E [g (X)] .
In analyzing EC,SN
[d(
SN, SN)]
, we can then distinguish into two cases:
E ≜{(
SN, SN)/∈ T (N)
ε
}and Ec ≜
{(SN, SN
)∈ T (N)
ε
}:
P {E}EC,SN
[d(
SN, SN)∣∣∣E]+ P {Ec}EC,SN
[d(
SN, SN)∣∣∣Ec
]≤ P {E}max
s,sd (s, s) + P {Ec} (1 + ε)
D1 + ε
≤ P {E}maxs,s
d (s, s) + D.
Hence, as long as P {E} vanishes as N → ∞, we are done.
34 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Converse ProofAchievability
Analysis of Expected Distortion → Analysis of P {E}
With typicality encoding, analysis of expected distortion is made easy:just need to control P {E}, where E ≜
{(SN, SN
)/∈ T (N)
ε
}.
Let us look at event E : it is the event that the reconstructed SN is notjointly typical with SN, which can only happen when none of thequantization codewords in the codebook is jointly typical with SN.
Hence, E ⊆∩2K
w=1 Acw, where Aw ≜
{(SN, SN(w)
)∈ T (N)
ε
}.
=⇒ P {E} ≤ P{∩2K
w=1 Acw
}.
Unfortunately, the events{Ac
w | w = 1, . . . , 2K}may not be mutually
independent, because they all involve a common random sequence SN.
However, for fixed sN, the events Acw(sN) ≜ {(
sN, SN(w))/∈ T (N)
ε
},
w = 1, . . . , 2K, are indeed mutually independent!
35 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Converse ProofAchievability
Analysis of P {E}, E ≜ {(SN, SN) /∈ T (N)ε }
Motivated by the above observation, we give an alternative upper bound:
P {E} ≤∑
sN∈SNp(sN)P{∩2K
w=1 Acw(sN)}
=∑
sN∈SNp(sN)∏2K
w=1 P{Ac
w(sN)}
=∑
sN∈SNp(sN)∏2K
w=1
(1− P
{Aw
(sN)})
Question: Is there a way to lower bound
P{Aw
(sN)} ≜ P
{(sN, SN(w)
)∈ T (N)
ε
(pS,S
)}?
Yes – As long as sN ∈ T (N)ε′ (pS) for some ε′ < ε, Lemma 1 (next slide)
guarantees that P{Aw
(sN)} ≥ 2−N(I(S ;S )+δ(ε)) for sufficiently large N,
where limε→0 δ(ε) = 0.
36 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Converse ProofAchievability
Joint Typicality Lemma
The following lemma formally states the bounds.(Proof is omitted – see Section 2.5 of ElGamal&Kim[6])
Lemma 1 (Joint Typicality Lemma)Consider a joint p.m.f. pX,Y = pX · pY|X = pY · pX|Y. Then, there existδ(ε) > 0 with limε→0 δ(ε) = 0 such that:
1 For an arbitrary sequence xn and random Yn ∼∏n
i=1 pY (yi),
P{(xn,Yn) ∈ T (n)
ε (pX,Y)}≤ 2−n(I(X ;Y )−δ(ε)).
2 For an ε′-typical sequence xn ∈ T (n)ε′ (pX) with ε′ < ε, and random
Yn ∼∏n
i=1 pY (yi), for sufficiently large n,
P{(xn,Yn) ∈ T (n)
ε (pX,Y)}≥ 2−n(I(X ;Y )+δ(ε)).
37 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Converse ProofAchievability
Finalizing the Proof
Revoking Lemma 1, the additional condition that sN ∈ T (N)ε′ (pS) for
some ε′ < ε motivates us to split the upper bound on P {E} as follows:
P {E} ≤∑
sN∈SNp(sN)∏2K
w=1
(1− P
{Aw
(sN)})
≤∑
sN /∈T (N)
ε′ (pS)
p(sN)+ ∑
sN∈T (N)
ε′ (pS)
p(sN)∏2K
w=1
(1− P
{Aw
(sN)})
≤ P{
SN /∈ T (N)ε′ (pS)
}+
∑sN∈T (N)
ε′ (pS)
p(sN) (1− 2−N(I(S ;S )+δ(ε))
)2K
≤ P{
SN /∈ T (N)ε′ (pS)
}+(1− 2−N(I(S ;S )+δ(ε))
)2K
≤ P{
SN /∈ T (N)ε′ (pS)
}+ exp
(−2K × 2−N(I(S ;S )+δ(ε))
).
The last step is due to (1− x)r ≤ e−rx for x ∈ [0, 1] and r ≥ 0.
38 / 39 I-Hsiang Wang IT Lecture 7
Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem
Converse ProofAchievability
We obtain a nice upper bound
P {E} ≤ P{
SN /∈ T (N)ε′ (pS)
}+ exp
(−2K × 2−N(I(S ;S )+δ(ε))
).
The first term vanishes as N → ∞ due to AEP. The second termvanishes as N → ∞ if
R > I(
S ; S)+ δ(ε) = R
(D
1+ε
)+ δ(ε).
Hence, for any R > R(
D1+ε
)+ δ(ε) can achieve average distortion ≤ D.
Finally, due to the continuity of rate-distortion function, we take ε → 0and complete the proof.
39 / 39 I-Hsiang Wang IT Lecture 7