lossless source coding algorithms · biometrics (with tanya ignatenko) lossless source coding...
TRANSCRIPT
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Lossless Source Coding Algorithms
Frans M.J. Willems
Department of Electrical EngineeringEindhoven University of Technology
ISIT 2013, Istanbul, Turkey
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Outline
1 INTRODUCTION
2 HUFFMAN and TUNSTALLBinary IID SourcesHuffman CodeTunstall Code
3 ENUMERATIVE CODINGLexicographical OrderingFV: Pascal-∆ MethodVF: Petry Code
4 ARITHMETIC CODINGIntervalsUniversal Coding, Individual Redundancy
5 CONTEXT-TREE WEIGHTINGIID, unknown θBinary Tree-SourcesContext TreesCoding Probabilities
6 REPETITION TIMESLZ77Repetition Times, KacRepetition-Time AlgorithmAchieving Entropy
7 CONCLUSION
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Choosing a Topic
POSSIBLE TOPICS:
Multi-user Information Theory (with Edward van der Meulen (KUL),Andries Hekstra)
Lossless Source Coding (with Tjalling Tjalkens, Yuri Shtarkov (IPPI),Paul Volf)
Watermarking, Embedding, and Semantic Coding (with Martin vanDijk, Ton Kalker (Philips Research))
Biometrics (with Tanya Ignatenko)
LOSSLESS SOURCE CODING ALGORITHMS
WHY?
Not many sessions at ISIT 2012! Is lossless source coding DEAD?
Lossless Source Coding is about UNDERSTANDING data. UniversalLossless Source Coding is focussing on FINDING STRUCTURE indata. MDL principle [Rissanen].
ALGORITHMS are fun (Piet Schalkwijk).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Choosing a Topic
POSSIBLE TOPICS:
Multi-user Information Theory (with Edward van der Meulen (KUL),Andries Hekstra)
Lossless Source Coding (with Tjalling Tjalkens, Yuri Shtarkov (IPPI),Paul Volf)
Watermarking, Embedding, and Semantic Coding (with Martin vanDijk, Ton Kalker (Philips Research))
Biometrics (with Tanya Ignatenko)
LOSSLESS SOURCE CODING ALGORITHMS
WHY?
Not many sessions at ISIT 2012! Is lossless source coding DEAD?
Lossless Source Coding is about UNDERSTANDING data. UniversalLossless Source Coding is focussing on FINDING STRUCTURE indata. MDL principle [Rissanen].
ALGORITHMS are fun (Piet Schalkwijk).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Choosing a Topic
POSSIBLE TOPICS:
Multi-user Information Theory (with Edward van der Meulen (KUL),Andries Hekstra)
Lossless Source Coding (with Tjalling Tjalkens, Yuri Shtarkov (IPPI),Paul Volf)
Watermarking, Embedding, and Semantic Coding (with Martin vanDijk, Ton Kalker (Philips Research))
Biometrics (with Tanya Ignatenko)
LOSSLESS SOURCE CODING ALGORITHMS
WHY?
Not many sessions at ISIT 2012! Is lossless source coding DEAD?
Lossless Source Coding is about UNDERSTANDING data. UniversalLossless Source Coding is focussing on FINDING STRUCTURE indata. MDL principle [Rissanen].
ALGORITHMS are fun (Piet Schalkwijk).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Lecture Structure
TUTORIAL, binary case, my favorite algorithms, ...
REMARKS, open problems, ...
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Outline
1 INTRODUCTION
2 HUFFMAN and TUNSTALLBinary IID SourcesHuffman CodeTunstall Code
3 ENUMERATIVE CODINGLexicographical OrderingFV: Pascal-∆ MethodVF: Petry Code
4 ARITHMETIC CODINGIntervalsUniversal Coding, Individual Redundancy
5 CONTEXT-TREE WEIGHTINGIID, unknown θBinary Tree-SourcesContext TreesCoding Probabilities
6 REPETITION TIMESLZ77Repetition Times, KacRepetition-Time AlgorithmAchieving Entropy
7 CONCLUSION
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Binary Sources, Sequences, IID
Binary Sourcex1x2 · · · xN
The binary source produces a sequence xN1 = x1x2 · · · xN with components
∈ {0, 1} with probability P(xN1 ).
Definition (Binary IID Source)
For an independent identically distributed (i.i.d.) source with parameterθ, for 0 ≤ θ ≤ 1,
P(xN1 ) =N∏
n=1
P(xn),
whereP(1) = θ, and P(0) = 1− θ.
A sequence xN1 containing N − w zeros and w ones has probability
P(xN1 ) = (1− θ)N−wθw .
Entropy IID Source
The ENTROPY of this source is h(θ)∆= (1− θ) log2
11−θ + θ log2
1θ
(bits).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Binary Sources, Sequences, IID
Binary Sourcex1x2 · · · xN
The binary source produces a sequence xN1 = x1x2 · · · xN with components
∈ {0, 1} with probability P(xN1 ).
Definition (Binary IID Source)
For an independent identically distributed (i.i.d.) source with parameterθ, for 0 ≤ θ ≤ 1,
P(xN1 ) =N∏
n=1
P(xn),
whereP(1) = θ, and P(0) = 1− θ.
A sequence xN1 containing N − w zeros and w ones has probability
P(xN1 ) = (1− θ)N−wθw .
Entropy IID Source
The ENTROPY of this source is h(θ)∆= (1− θ) log2
11−θ + θ log2
1θ
(bits).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Fixed-to-Variable (FV) Length Codes
IDEA:
Give more probable sequences shorter codewords than less probablesequences.
Definition (FV-Length Code)
A FV-length code assigns to source sequence xN1 a binary codeword c(xN1 )
of length L(xN1 ). The rate of a FV code is
R∆=
E [L(XN1 )]
N(code-symbols/source-symbol).
GOAL:
We would like to find decodable FV-length codes that MINIMIZE this rate.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Prefix Codes
Definition (Prefix code)
In a prefix code no codeword is the prefix of any other codeword.
We focus on prefix codes. Codewords in a prefix code can be regarded asleaves in a rooted tree. Prefix codes lead to instantaneous decodability.
Example
xN1 c(xN1 ) L(xN1 )00 0 101 10 210 110 311 111 3
∅1
0
11
10
111
110
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Prefix-Codes (cont.)
Theorem (Kraft, 1949)
(a) The lengths of the codewords in a prefix code satisfy Kraft’s inequality∑xN1 ∈X
N
2−L(xN1 ) ≤ 1.
(b) For codeword lengths satisfying Kraft’s inequality there exists a prefixcode with these lengths.
This leads to:
Theorem (Fano, 1961)
(a) Any prefix code satisfies
E [L(XN1 )] ≥ H(XN
1 ) = Nh(θ),
or equivalently R ≥ h(θ). The minimum is achieved if and only ifL(xN1 ) = − log2(P(xN1 )) (ideal codeword length) for all xN1 ∈ XN with
nonzero P(xN1 ).(b) There exist prefix codes with
E [L(XN1 )] < H(XN
1 ) + 1 = Nh(θ) + 1,
or equivalently R < h(θ) + 1/N.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Code
Definition (Optimal FV-length Code)
A code that minimizes the expected codeword-length E [L(XN1 )] (and the
rate R) is called optimal.
Theorem (Huffman, 1952)
The Huffman construction leads to an optimal FV-length code.CONSTRUCTION:
Consider the set of probabilities {P(xN1 ), xN1 ∈ XN}.Replace two smallest probabilities by a probability which is their sum.Label the branches from these two smallest probabilities to their sumwith code-symbols “0” and “1”.
Continue like this until only one probability (equal to 1) is left.
Obviously Huffman’s code results in E [L(XN1 )] < H(XN
1 ) + 1 = Nh(θ) + 1and therefore R < h(θ) + 1/N.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Code
Definition (Optimal FV-length Code)
A code that minimizes the expected codeword-length E [L(XN1 )] (and the
rate R) is called optimal.
Theorem (Huffman, 1952)
The Huffman construction leads to an optimal FV-length code.CONSTRUCTION:
Consider the set of probabilities {P(xN1 ), xN1 ∈ XN}.Replace two smallest probabilities by a probability which is their sum.Label the branches from these two smallest probabilities to their sumwith code-symbols “0” and “1”.
Continue like this until only one probability (equal to 1) is left.
Obviously Huffman’s code results in E [L(XN1 )] < H(XN
1 ) + 1 = Nh(θ) + 1and therefore R < h(θ) + 1/N.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0
.2161
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(XN1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+
2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0
.2161
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(XN1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+
2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0
.2161
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(XN1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+
2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0
.2161
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(XN1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+
2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0
.2161
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(XN1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+
2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0
.2161
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(XN1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+
2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0
.2161
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(XN1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+
2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0
.2161
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(XN1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+
2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0
.2161
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(XN1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+
2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Remarks: Huffman Code
Note that R ↓ h(θ) when N →∞.
Always E [L(XN1 )] ≥ 1. For θ ≈ 0 a Huffman code has expected
codeword length E [L(XN1 )] ≈ 1 and rate R ≈ 1/N.
Better bounds exist for Huffman codes than E [L(XN1 )] < H(XN
1 ) + 1.E.g. Gallager [1978] showed that
E [L(XN1 )]− H(XN
1 ) ≤ maxxN1
P(xN1 ) + 0.086.
Adaptive Huffman Codes (Gallager [1978]).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Variable-to-Fixed (VF) Length Codes
IDEA:
Parse the source output into variable-length segments of roughly the sameprobability. Code all these segments with codewords of fixed length.
Definition (VF-Length Code)
A VF-length code is defined by a set of variable-length source segments.Each segment x∗ in the set gets a unique binary codeword c(x∗) of lengthL. The length of a segment x∗ is denoted as N(x∗). The rate of aVF-code is
R∆=
L
E [N(X∗)](code-symbols/source symbol).
GOAL:
We would like to find parsable VF-length codes that MINIMIZE this rate.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Proper-and-Complete Segment Sets
Definition (Proper-and-Complete Segment Sets)
A set of source segments is proper-and-complete if each semi-infinitesource sequence has a unique prefix in this segment set.
We focus on proper-and-complete segments sets. Segments in aproper-and-complete set can be regarded as leaves in a rooted tree. Suchsets guarantee instantaneous parsability.
Example
x∗ N(x∗) c(x∗)1 1 1101 2 10001 3 01000 3 00
∅0
1
00
01
000
001
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Proper-and-Complete Segment Sets: Leaf-Node Lemma
Assume that the source is IID with parameter θ. Consider a set of segmentsand all their prefixes. Depict them in a tree. The segments are leaves, theprefixes nodes. Note that all the nodes and leaves have a probability. E.g.P(10) = θ(1− θ). Let F (·) be a function on nodes, leaves.
F (∅)
F (0)
F (1)
0
1
F (10)
F (11)
0
1
F (100)
F (101)
0
1
Lemma (Massey, 1983)∑l∈leaves
P(l)[F (l)− F (∅)] =∑
n∈nodes
P(n)∑
s∈sons of n
P(s)
P(n)[F (s)− F (n)].
Let F (x∗) = # of edges from x∗ to root, then
E [N(X∗)] =∑
x∗∈nodes
P(x∗).
Let F (x∗) = − log2 P(x∗), then
H(X∗) = E [N(X∗)]h(θ).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Proper-and-Complete Segment Sets: Leaf-Node Lemma
Assume that the source is IID with parameter θ. Consider a set of segmentsand all their prefixes. Depict them in a tree. The segments are leaves, theprefixes nodes. Note that all the nodes and leaves have a probability. E.g.P(10) = θ(1− θ). Let F (·) be a function on nodes, leaves.
F (∅)
F (0)
F (1)
0
1
F (10)
F (11)
0
1
F (100)
F (101)
0
1
Lemma (Massey, 1983)∑l∈leaves
P(l)[F (l)− F (∅)] =∑
n∈nodes
P(n)∑
s∈sons of n
P(s)
P(n)[F (s)− F (n)].
Let F (x∗) = # of edges from x∗ to root, then
E [N(X∗)] =∑
x∗∈nodes
P(x∗).
Let F (x∗) = − log2 P(x∗), then
H(X∗) = E [N(X∗)]h(θ).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Proper-and-Complete Segment Sets: Result
Theorem
For any proper-and-complete segment set with no more than 2L segments
L ≥ H(X∗) = E [N(X∗)]h(θ),
or R = L/E [N(X∗)] ≥ h(θ).
More precisely, since
R =L
E [N(X∗)]=
L
H(X∗)h(θ),
we should make H(X∗) as close as possible to L, hence all segmentsshould have roughly the same probability.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Tunstall’s Code
Consider 0 < θ ≤ 1/2.
Definition (Optimal VF-length Code)
A code that maximizes the expected segment-length E [N(X∗)] is calledoptimal. Such a code minimizes the rate R.
Theorem (Tunstall, 1967)
The Tunstall construction leads to an optimal code.CONSTRUCTION:
Start with the empty segment ∅ which has unit probability.
As long as the number of segments is smaller than 2L replace asegment s with largest probability P(s) by two segments s0 and s1.The probabilities of the new segments (leaves) areP(s0) = P(s)(1− θ) and P(s1) = P(s)θ.
The Tunstall construction results in H(X∗) ≥ L− log2(1/θ) and thereforeR ≤ L
L+log2(θ)h(θ) (Jelinek and Schneider [1972]).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Tunstall’s Code
Consider 0 < θ ≤ 1/2.
Definition (Optimal VF-length Code)
A code that maximizes the expected segment-length E [N(X∗)] is calledoptimal. Such a code minimizes the rate R.
Theorem (Tunstall, 1967)
The Tunstall construction leads to an optimal code.CONSTRUCTION:
Start with the empty segment ∅ which has unit probability.
As long as the number of segments is smaller than 2L replace asegment s with largest probability P(s) by two segments s0 and s1.The probabilities of the new segments (leaves) areP(s0) = P(s)(1− θ) and P(s1) = P(s)θ.
The Tunstall construction results in H(X∗) ≥ L− log2(1/θ) and thereforeR ≤ L
L+log2(θ)h(θ) (Jelinek and Schneider [1972]).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Tunstall’s Construction
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N(X∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Tunstall’s Construction
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N(X∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Tunstall’s Construction
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N(X∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Tunstall’s Construction
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N(X∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Tunstall’s Construction
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N(X∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Tunstall’s Construction
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N(X∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Tunstall’s Construction
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N(X∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Tunstall’s Construction
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N(X∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Tunstall’s Construction
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N(X∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Remarks: Tunstall Code
Note that R ↓ h(θ) when L→∞.
For θ ≈ 0 a Tunstall code has expected segment lengthE [N(X∗)] ≈ 2L − 1 and rate R ≈ L/(2L − 1). Better than Huffmanfor L = N.
In each step in the Tunstall procedure, a leaf with the largestprobability is changed into a node. This leads to:
The largest increase in expected segment length (Massey LN-lemma),and P(n) ≥ P(l) for all nodes n and leaves l .Therefore for any two leafs l and l′ we can say that
P(l) ≥ θP(n) ≥ P(l′).
So leaves cannot differ too much in probability. This fact is used to lowerbound H(X∗) ≥ L − log2(1/θ).
Optimal VF-length codes can also be found by fixing a number γ anddefining a node to be internal if its probability is ≥ γ (Khodak, 1969).The size of the segment set is not completely controllable now.
Run-length Codes (Golomb [1966]).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Outline
1 INTRODUCTION
2 HUFFMAN and TUNSTALLBinary IID SourcesHuffman CodeTunstall Code
3 ENUMERATIVE CODINGLexicographical OrderingFV: Pascal-∆ MethodVF: Petry Code
4 ARITHMETIC CODINGIntervalsUniversal Coding, Individual Redundancy
5 CONTEXT-TREE WEIGHTINGIID, unknown θBinary Tree-SourcesContext TreesCoding Probabilities
6 REPETITION TIMESLZ77Repetition Times, KacRepetition-Time AlgorithmAchieving Entropy
7 CONCLUSION
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Lexicographical Ordering
IDEA:
Sequences having the same weight (and probability) only need to beINDEXED. The binary representation of the index can be taken ascodeword.
Definition (Lexicographical Ordering, Index)
In a lexicographical ordering (0 < 1) we say that xN1 < yN1 if xn < yn for
the smallest index n such that xn 6= yn.Consider a subset S of the set {0, 1}N . Let iS(xN1 ) be the lexicographical
index of xN1 ∈ S, i.e., the number of sequences yN1 < xN1 for yN
1 ∈ S.
Example
Let N = 5 and S = {xN1 : w(xN1 ) = 2} where w(xN1 ) is the weight of xN1 .
Then |S| =(5
2
)= 10 and:
iS(11000) = 9 iS(01100) = 4
iS(10100) = 8 iS(01010) = 3
iS(10010) = 7 iS(01001) = 2
iS(10001) = 6 iS(00110) = 1
iS(01100) = 5 iS(00011) = 0
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Sequential Enumeration
Theorem (Cover, 1973)
From the sequence xN1 ∈ S we can compute index
iS(xN1 ) =∑
n=1,N:xn=1
#S(x1, x2, · · · , xn−1, 0),
where #S(x1, x2, · · · , xk ) denotes the number of sequences in Shaving prefix x1, · · · , xk .
Moreover from the index iS(xN1 ) the sequence xN1 can be computed ifnumbers #S(x1, x2, · · · , xn−1, 0) for n = 1,N are available.
The index of a sequence can be represented by a codeword of fixed lengthdlog2 |S|e.
Example
Index iS(10100) = #S(0) + #S(100) =(4
2
)+(2
1
)= 6 + 2 = 8 hence, since
|S| = 10 the corresponding codeword is 1000.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
FV: Pascal-Triangle Method
IDEA:
Index sequences of fixed weight. Later use a Huffman code (or afixed-length code) to describe the weights.
Example (Lynch (1966), Davisson (1966), Schalkwijk (1972))
Let N = 5 and S = {xN1 :∑
xn = 2}. Then |S| =(5
2
)= 10.
10
4
6
1
3
3
1
2
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
Index from Sequence:
i(10100) = 6 + 2 = 8.
Sequence from Index:Index i = 8, nowa) 8 ≥ 6 hence x1 = 1,b) i < 6 + 3 hence x2 = 0,c) i ≥ 6 + 2 hence x3 = 1,d) x4 = x5 = 0.
Pascal Triangle.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
FV: Pascal-Triangle Method (cont.)
First note that
H(XN1 ) = H(XN
1 ,w(XN1 )) = H(W ) + H(XN
1 |W ).
If we use enumerative coding for XN1 given weight w , since all sequences
with a fixed weight have equal probability
E [L(XN1 |W )] =
∑w=0,1,N
P(w) log2d(Nw
)e
<∑
w=0,1,N
P(w) log2
(Nw
)+ 1 = H(XN
1 |W ) + 1.
If W is encoded using a Huffman code we obtain
E [L(XN1 )] = E [L(W )] + E [L(XN
1 |W )]
≤ H(W ) + 1 + H(XN1 |W ) + 1
= H(XN1 ) + 2.
Worse than Huffman, but no big code-table needed however.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Remarks: FV Pascal-Triangle Method
Enumeration for sequences generated by Markov Sources (Cover[1973]).
Universal approach:
Davisson [1966]
If W is encoded with a fixed-length codeword of dlog2(N + 1)e bits, thenentropy is achieved for every θ for N →∞.
Lexicographical ordering also possible for variable-length sourcesegments.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
VF: Petry Code
IDEA:
Modify the Tunstall segment sets such that the segments can be indexed.
Again let 0 < θ ≤ 1/2. It can be shown that a proper-and-completesegment set is a Tunstall set (maximal E [N(X∗)] given the number ofsegments) if and only if for all nodes n and all leaves l
P(n) ≥ P(l).
Consequence
If the segments x∗ in a proper-and-complete segment set satisfy
P(x∗−1) > γ ≥ P(x∗),
where x∗−1 is x∗ without the last symbol, this segment set is a Tunstallset. Constant γ determines the size of the set.
SinceP(x∗) = (1− θ)n0(x∗)θn1(x∗),
where n0(x∗) is the number or zeros in x∗ and n1(x∗) the number of onesin x∗, etc., this is equivalent to
An0(x∗−1) + Bn1(x∗−1) < C ≤ An0(x∗) + Bn1(x∗)
for A = − logb(1− θ), B = − logb θ, C = − logb γ, and some log-base b.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
VF: Petry Code (cont.)
Note that log-base b has to satisfy
1 = (1− θ) + θ = b−A + b−B .
For special values of θ, A and B are integers. E.g. for θ = (1− θ)2 weobtain that A = 1 and B = 2 for b = (1 +
√5)/2. Now C can also
assumed to be integer. The corresponding codes are called Petry codes.
Definition (Petry (Schalkwijk), 1982)
Fix integers A and B. The segments x∗ in a proper-and-complete Petrysegment set satisfy
An0(x∗−1) + Bn1(x∗−1) < C ≤ An0(x∗) + Bn1(x∗).
Integer C can be chosen to control the size of the set.
Linear Array
Petry codes can be implemented using a linear array.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
VF: Petry Code (cont.)
Example
Consider A = 1, B = 2. Costs, step-sizes.For given C , let S(C) denote the resulting segment set and σ(C) itscardinality. Let S(−1) = S(0) = ∅, then S(1) = {0, 1}, S(2) = {00, 01, 1},etc. Moreover now σ(−1) = σ(0) = 1, σ(1) = 2 and σ(2) = 3, etc. It iseasy to see that
σ(C) = σ(C − 1) + σ(C − 2),
and therefore σ(3) = 5, σ(4) = 8, σ(5) = 13, σ(6) = 21, σ(7) = 34, andσ(8) = 55.Now take C = 8. Note that 010010 ∈ S(8).We can now determine the index i(010010) using Cover’s formula:
i(010010) = #(00) + #(01000) = σ(6) + σ(2) = 21 + 3 = 24.
55 34 21 13 8 5 3 2 1 10
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
VF: Petry Code (cont.)
Theorem (Tjalkens & W. (1987))
A Petry code with parameters A < B, and C is a Tunstall code forparameter q where q = b−B when b is the solution of b−A + b−B = 1.For arbitrary θ the rate
log2 σ(C)
E [N(X∗)]≤
C + (B − 1)
C(h(θ) + d(θ||q)).
Example
In the table q for several values of A and B:
A B 2 3 4 51 0.382 0.318 0.276 0.2452 0.4302 0.382 0.3463 0.450 0.412
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Remarks: VF Petry Code
Note that log2 σ(C)/E [N(X∗)] ↓ h(θ) + d(θ|q) when C →∞, hencea Petry code achieves entropy for θ = q.
Tjalkens and W. investigated VF-length Petry codes for Markovsources, again with a linear array for each state.
VF-length universal enumerative solutions exist (Lawrence [1977],Tjalkens and W. [1992]).
The numbers in the linear array show exponential behaviour. Also anarray b2−i/Mef for i = 1,M can be used, through which we makesteps (Tjalkens [PhD, 1987]). This reduces the storage complexityand is similar to Rissanen [1976] multiplication-avoiding arithmeticcoding (Generalized Kraft inequality).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Outline
1 INTRODUCTION
2 HUFFMAN and TUNSTALLBinary IID SourcesHuffman CodeTunstall Code
3 ENUMERATIVE CODINGLexicographical OrderingFV: Pascal-∆ MethodVF: Petry Code
4 ARITHMETIC CODINGIntervalsUniversal Coding, Individual Redundancy
5 CONTEXT-TREE WEIGHTINGIID, unknown θBinary Tree-SourcesContext TreesCoding Probabilities
6 REPETITION TIMESLZ77Repetition Times, KacRepetition-Time AlgorithmAchieving Entropy
7 CONCLUSION
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Idea Elias
Elias:
If source sequences are ORDERED LEXICOGRAPHICALLY thencodewords can be COMPUTED SEQUENTIALLY from the sourcesequence using conditional PROBABILITIES of next symbol given theprevious ones, and vice versa.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Source Intervals
Definition
Order the source sequences xN1 ∈ {0, 1}N lexicographically according to0 < 1.Now, to each source sequence xN1 ∈ {0, 1}N there corresponds asource-interval
I (xN1 ) = [Q(xN1 ),Q(xN1 ) + P(xN1 ))
withQ(xN1 ) =
∑x̃N1 <xN1
P(x̃N).
By construction the source intervals are all disjoint. Their union is [0, 1).
Example
Consider an I.I.D. source with θ = 0.2 and N = 2.
xN1 P(xN1 ) Q(xN1 ) I (xN1 )00 0.64 0 [0 , 0.64)01 0.16 0.64 [0.64 , 0.8)10 0.16 0.8 [0.8 , 0.96)11 0.04 0.96 [0.96 , 1)
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Code Intervals
A codeword c with length L can be regarded as a binary fraction .c.If we concatenate this codeword with others the corresponding fractioncan increase, but no more than 2−L.
Definition
To a codeword c(xN1 ) with length L(xN1 ) there corresponds a code interval
J(xN1 ) = [.c(xN1 ), .c(xN1 ) + 2−L(xN1 )).
Note that J(xN1 ) ⊂ [0, 1).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Arithmetic coding: Encoding and Decoding
Procedure
ENCODING: Choose c such that the code interval ⊆ source interval,i.e.
[.c, .c + 2−L) ⊆ [Q(xN1 ),Q(xN1 ) + P(xN1 )).
DECODING: Is possible since there is only one source interval thatcontains the code interval.
Theorem
For sequence xN1 with source-interval I (xN1 ) = [Q(xN1 ),Q(xN1 ) + P(xN1 ))
take c(xN1 ) as the codeword with
L(xN1 )∆= dlog2
1
P(xN1 )e+ 1
.c(xN1 )∆= dQ(xN1 ) · 2L(xN1 )e · 2−L(xN1 ).
ThenJ(c(xN1 )) ⊆ I (xN1 ).
and
L(xN1 ) < log2
1
P(xN1 )+ 2,
i.e. less than two bits above the ideal codeword length.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Example
I.I.D. source with θ = 0.2 and N = 2.
Source Intervals
11
10
01
00
0.00
0.64
0.80
0.961.00 111110
1101
1011
00
0
1/4
11/163/4
13/16
7/8
31/3263/64
Code Intervals
Source-intervals are disjoint ⇒ code-intervals are disjoint ⇒ prefixcondition holds.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Example
I.I.D. source with θ = 0.2 and N = 2.
Source Intervals
11
10
01
00
0.00
0.64
0.80
0.961.00 111110
1101
1011
00
0
1/4
11/163/4
13/16
7/8
31/3263/64
Code Intervals
Source-intervals are disjoint ⇒ code-intervals are disjoint ⇒ prefixcondition holds.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Arithmetic Coding: Sequential Computation (Elias)
Example (Connection to Cover’s formula)
Let L = 3 and θ = 0.2.
∅
0
1
00
01
10
11
000
001
010
011
100
101
110
111
Q(101) = P(0) + P(100) = 0.8 + 0.2 · 0.8 · 0.8 = 0.928.
P(101) = P(1)P(0)P(1) = 0.2 · 0.8 · 0.2 = 0.032.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Arithmetic Coding: Sequential Computation (Elias)
In general
Q(xN1 ) =∑
n=1,N:xn=1
P(x1, x2, · · · , xn−1, 0),
P(xN1 ) =N∏
n=1
P(xn|x1, x2, · · · , xn−1).
Sequential Computation
If we have access to P(x1, x2, · · · , xn, 0) and P(x1, x2, · · · , xn, 1) afterhaving processed P(x1, x2, · · · , xn) for n = 1, 2, · · · ,N we can computeI (xN1 ) sequentially.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Universal Coding
Coding Probabilities
If the actual probabilities P(xN1 ) are not known arithmetic coding is still
possible if instead of P(xN1 ) we use coding probabilities Pc (xN1 ) satisfying
Pc (xN1 ) > 0 for all xN1 , and∑xN1
Pc (xN1 ) = 1.
Then
L(xN1 ) < log2
1
Pc (xN1 )+ 2.
Encoder DecoderxN1 c(xN1 ) xN1
Pc (x1 · · · xn−1, 0),Pc (x1 · · · xn−1, 1)
for n = 1,N
Pc (x1 · · · xn−1, 0),Pc (x1 · · · xn−1, 1)
for n = 1,N
PROBLEM: How do we choose the coding probabilities Pc (xN1 )?
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Individual Redundancy
Definition
The individual redundancy ρ(xN1 ) of a sequence xN1 is defined as
ρ(xN1 ) = L(xN1 )− log2
1
P(xN1 ),
i.e. codeword-length minus ideal codeword-length.
Bound Individual Redundancy
Arithmetic coding based on coding probabilities {Pc (xN1 ), xN1 ∈ {0, 1}N}yields
ρ(xN1 ) < log2
1
Pc (xN1 )+ 2− log2
1
P(xN1 )= log2
P(xN1 )
Pc (xN1 )+ 2.
We say that the CODING redundancy < 2 bits.The coding probabilities should be as large as possible (as close as possibleto the actual probabilities). Next focus on remaining part of the individual
redundancy log2P(xN1 )
Pc (xN1 ).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Remarks: Arithmetic Coding
Shannon [1948] already described relation between codewords andintervals, ordered probabilities however. Called Shannon-Fano code.
Shannon-Fano-Elias, arbitrary ordering, but not sequential.
Finite precision issues arithmetic coding solved by Pasco [1976] andRissanen [1976].
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Outline
1 INTRODUCTION
2 HUFFMAN and TUNSTALLBinary IID SourcesHuffman CodeTunstall Code
3 ENUMERATIVE CODINGLexicographical OrderingFV: Pascal-∆ MethodVF: Petry Code
4 ARITHMETIC CODINGIntervalsUniversal Coding, Individual Redundancy
5 CONTEXT-TREE WEIGHTINGIID, unknown θBinary Tree-SourcesContext TreesCoding Probabilities
6 REPETITION TIMESLZ77Repetition Times, KacRepetition-Time AlgorithmAchieving Entropy
7 CONCLUSION
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
CTW: Universal Codes
IDEA:
Find good coding probabilities for sources with UNKNOWNPARAMETERS and STRUCTURE. Use WEIGHTING!
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Coding for a Binary IID Source, Unknown θ
Definition (Krichevsky-Trofimov estimator (1981))
A good coding probability Pc (xN1 ) for a sequence xN1 that contains a zeroesand b = N − a ones is
Pe(a, b) =
∫ 1
θ=0
1
π√
(1− θ)θ· (1− θ)aθbdθ.
(Dirichlet-(1/2, 1/2) prior, “weighting”).
Theorem
Upperbound on the PARAMETER redundancy
log2
P(xN1 )
Pc (xN1 )= log2
θa(1− θ)b
Pe(a, b)≤
1
2log2(a + b) + 1 =
1
2log2(N) + 1.
for all θ and xN1 with a zeros and b ones.
Probability of a sequence with a zeroes and b ones followed by a zero
Pe(a + 1, b) =a + 1/2
a + b + 1· Pe(a, b),
hence SEQUENTIAL COMPUTATION is possible!
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Individual Redundancy Binary IID source
The total individual redundancy
ρ(xN1 ) < log2
θa(1− θ)b
Pe(a, b)+ 2 ≤
(1
2log2(N) + 1
)+ 2.
for all θ and xN1 with a zeroes and b ones.
Shtarkov [1988]: 12
log2 N behaviour is asymptotically optimal forindividual redundancy for N →∞ (NML-estimator)!
Rissanen [1984]: Also for expected redundancy 12
log2 N behaviour isasymptotically optimal.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
CTW: Binary Tree-Sources
Definition
∅
0
1
00
10
(tree-) model M = {00, 10, 1}
· · · xn−2 xn−1 xn
θ1 = 0.1
θ10 = 0.3
θ00 = 0.5
parameters
P(Xn = 1| · · · ,Xn−1 = 1) = 0.1
P(Xn = 1| · · · ,Xn−2 = 1,Xn−1 = 0) = 0.3
P(Xn = 1| · · · ,Xn−2 = 0,Xn−1 = 0) = 0.5
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
CTW: Binary Tree-Sources
Definition
∅
0
1
00
10
(tree-) model M = {00, 10, 1}
· · · xn−2 xn−1 xn
θ1 = 0.1
θ10 = 0.3
θ00 = 0.5
parameters
P(Xn = 1| · · · ,Xn−1 = 1) = 0.1
P(Xn = 1| · · · ,Xn−2 = 1,Xn−1 = 0) = 0.3
P(Xn = 1| · · · ,Xn−2 = 0,Xn−1 = 0) = 0.5
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
CTW: Binary Tree-Sources
Definition
∅
0
1
00
10
(tree-) model M = {00, 10, 1}
· · · xn−2 xn−1 xn
θ1 = 0.1
θ10 = 0.3
θ00 = 0.5
parameters
P(Xn = 1| · · · ,Xn−1 = 1) = 0.1
P(Xn = 1| · · · ,Xn−2 = 1,Xn−1 = 0) = 0.3
P(Xn = 1| · · · ,Xn−2 = 0,Xn−1 = 0) = 0.5
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
CTW: Problem, Concepts
PROBLEM:What are good coding probabilities for sequences xN1 produced by atree-source with
an unknown tree-model,
and unknown parameters?
CONCEPTS:
CONTEXT TREE (Rissanen [1983])
WEIGHTING: If P1(x) or P2(x) are two alternative codingprobabilities for sequence x , then the weighted probability
Pw (x)∆=
P1(x) + P2(x)
2≥
1
2max(P1(x),P2(x)),
thus we loose at most a factor of 2, which is one bit in redundancy.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
CTW: Problem, Concepts
PROBLEM:What are good coding probabilities for sequences xN1 produced by atree-source with
an unknown tree-model,
and unknown parameters?
CONCEPTS:
CONTEXT TREE (Rissanen [1983])
WEIGHTING: If P1(x) or P2(x) are two alternative codingprobabilities for sequence x , then the weighted probability
Pw (x)∆=
P1(x) + P2(x)
2≥
1
2max(P1(x),P2(x)),
thus we loose at most a factor of 2, which is one bit in redundancy.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Context Trees
Definition (Context Tree)
∅
0
1
00
10
01
11
000
100
010
110
001
101
011
111
Node s contains the sequence of source symbols that have occurredfollowing context s. Depth is D.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Context-tree splits up sequences in subsequences
Example
1 2 3 4 5 6 7 n→
0 1 1 0 1 0 0 · · ·
xN1
010
past
1234567
1257
346
2
157
36
4
-
2
17
5
3
6
4
-
0
0
0
0
0
0
0
1
1
1
1
1
1
1
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Coding Probabilities: Leaves of the Context-Tree
∅
0
1
00
10
01
11
000
100
010
110
001
101
011
111
tree-model M = {00, 10, 1}
CTW: Leaves
The subsequence corresponding to a leaf s of the context tree is IID. Agood coding probability for this subsequence is therefore
Pw (s)∆= Pe(as , bs),
where as and bs are the number of zeroes and ones of this subsequence.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Coding Probabilities: Internal nodes of the Context-Tree
∅
0
1
00
10
01
11
000
100
010
110
001
101
011
111
tree-model M = {00, 10, 1}
The subsequence corresponding to a node s of the context tree is
IID if the node s is not an internal node of the actual tree-model,
a combination of the subsequences that correspond to nodes 0s and1s, if s is an internal node of the actual tree-model.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Weighting
CTW: Internal Nodes
Weighting the coding probabilities corresponding to both alternatives yieldsthe coding probability
Pw (s)∆=
Pe(as , bs) + Pw (0s) · Pw (1s)
2
for the subsequence that corresponds to node s.Recursively we find in the root ∅ of the context-tree the coding probabilityPw (∅) for the entire source sequence xN1 .
IMPORTANT:
Coding probability Pw (λ) can be computed sequentially.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Redundancy (tree-model M = {00, 10, 1})
Actual probability:
P(xN1 ) = (1− θ00)a00θb0000 (1− θ10)a10θb10
10 (1− θ1)a1θb11 .
Lower bound coding probability:
Pw (∅) ≥1
2Pw (0) · Pw (1)
≥1
2
1
2Pw (00) · Pw (10) ·
1
2Pe(a1, b1)
≥1
2
1
2
1
2Pe(a00, b00) ·
1
2Pe(a10, b10) ·
1
2Pe(a1, b1).
Parameter redundancy bounds for the subsequences in the leaves oftree-model M = {00, 10, 1}:
log2
(1− θ00)a00θb0000
Pe(a00, b00)≤
1
2log2(a00 + b00) + 1,
log2
(1− θ10)a10θb1010
Pe(a10, b10)≤
1
2log2(a10 + b10) + 1,
log2
(1− θ1)a1θb11
Pe(a1, b1)≤
1
2log2(a1 + b1) + 1.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Redundancy (General)
ρ(xN1 ) < log2
P(xN1 )
Pw (∅)+ 2
≤ log2 32 +1
2log2(a00 + b00) + 1 +
1
2log2(a10 + b10) + 1
+1
2log2(a1 + b1) + 1 + 2
≤ 5 +
(3
2log2
N
3+ 3
)+ 2,
for all xN1 , and all θ00, θ10, and θ1.
Theorem (W., Shtarkov, and Tjalkens (1995))
In general for a tree source with |M| leaves (parameters):
ρ(xN1 ) < (2|M| − 1) +
(|M|
2log2
N
|M|+ |M|
)+ 2 bits.
(model, parameter, and coding redundancies)
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Simulation: Model plus Parameter Redundancies
Redundancies for the CTW method, but also for methods focussing onM = ∅, M = {0, 1}, actual model M = {00, 10, 1}, M = {0, 01, 11} andM = {00, 10, 01, 11}. The CTW method improves over the best model!
0 50 100 150 200 250 300 350 400 450 5000
5
10
15
20
25
30
35
40redundancies(n)
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Remarks: Context-Tree Weighting
CTW implements a “weighting” (Bayes mixture) over all tree-modelswith depth not exceeding D, i.e.
Pw (λ) =∑M≤D
P(M)Pe(xN1 |M),
with Pe(xN1 |M) =∏
s∈M Pe(as , bs) and P(M) = 2−(2|M|−1).
There is one tree-model of depth 0 (i.e., the IID model). If there are#d models of depth not exceeding d then #d+1 = #2
d + 1. Therefore#1 = 2, #2 = 5, #3 = 26, #4 = 677, #5 = 458330,#6 = 210066388901, #7 = 4.4128 · 1022, #8 = 1.9473 · 1045, etc.
Straightforward analysis. No model-estimation that only givesasymptotic results as in e.g. Rissanen [1983, 1986], Weinberger,Rissanen, and Feder [1995]).
Number of computations needed to process the source sequence xN1 islinear in N. Same holds for the storage complexity.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Remarks: Context-Tree Weighting (cont.)
Optimal parameter redundancy behavior in Rissanen [1984] sense (i.e.,12
log2 N bits/parameter).
A modified version achieves entropy not only for tree sources but forall stationary ergodic sources.
More general context-algorithms (splitting rules) were proposed. Thecontext of xn need not be xn−d , xn−d+1, · · · , xn−1.
A two-pass version (context-tree maximizing) exists that finds thebest model (MDL) matching to the source sequence. Now
Pm(s)∆=
max[Pe(as , bs),Pm(0s) · Pm(1s)]
2.
If a (minimal) tree source with model generates the sequence xN1 themaximizing method produces a model estimate which is correct withprobability one as N →∞.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Outline
1 INTRODUCTION
2 HUFFMAN and TUNSTALLBinary IID SourcesHuffman CodeTunstall Code
3 ENUMERATIVE CODINGLexicographical OrderingFV: Pascal-∆ MethodVF: Petry Code
4 ARITHMETIC CODINGIntervalsUniversal Coding, Individual Redundancy
5 CONTEXT-TREE WEIGHTINGIID, unknown θBinary Tree-SourcesContext TreesCoding Probabilities
6 REPETITION TIMESLZ77Repetition Times, KacRepetition-Time AlgorithmAchieving Entropy
7 CONCLUSION
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Lempel-Ziv 1977 Compression
IDEA:
Let the data speak for itself.
LZ77 Compression is achieved by replacing repeated segments in the datawith pointers and lengths. To avoid deadlock an uncoded symbol is addedto each pointer and length.
Example (LZ77)
(0,-,a)
b r a c a d a b r a
arbadacarba
a b r a c a d a b r a
arbadacarba
a b r a c a d a b r a
arbadacarba
a b r a c a d a b r a
outputlook-ahead buffersearch buffer
(7,4, )
(2,1,d)
(3,1,c)
(0,-,r)
(0,-,b)
a
QUESTION:
Why does this method work? Note that the statistics of the data areunknown!
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Lempel-Ziv 1977 Compression
IDEA:
Let the data speak for itself.
LZ77 Compression is achieved by replacing repeated segments in the datawith pointers and lengths. To avoid deadlock an uncoded symbol is addedto each pointer and length.
Example (LZ77)
(0,-,a)
b r a c a d a b r a
arbadacarba
a b r a c a d a b r a
arbadacarba
a b r a c a d a b r a
arbadacarba
a b r a c a d a b r a
outputlook-ahead buffersearch buffer
(7,4, )
(2,1,d)
(3,1,c)
(0,-,r)
(0,-,b)
a
QUESTION:
Why does this method work? Note that the statistics of the data areunknown!
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Repetition Times
Consider the discrete stationary and ergodic process
· · · ,X−3,X−2,X−1,X0,X1,X2, · · · .
Suppose that X1 = x for symbol-value x ∈ X with Pr{X1 = x} > 0. Wesay that the repetition time of the x that occurred at time t = 1 is m ifX1−m = x and Xt 6= x for t = 2−m, · · · , 0.
m = 4
= x6= x6= x6= x= x
X2X1X0X−1X−2X−3
?
Definition (Average Repetition Time)
Let Qx (m) be the conditional probability that the repetition time of the xoccurring at t = 1 is m, hence
Qx (m)∆= Pr{X1−m = x ,X2−m 6= x , · · · ,X0 6= x |X1 = x}.
The average repetition time for symbol-value x with Pr{X1 = x} > 0 isnow defined as
T (x)∆=
∑m=1,2,···
mQx (m).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Repetition Times
Consider the discrete stationary and ergodic process
· · · ,X−3,X−2,X−1,X0,X1,X2, · · · .
Suppose that X1 = x for symbol-value x ∈ X with Pr{X1 = x} > 0. Wesay that the repetition time of the x that occurred at time t = 1 is m ifX1−m = x and Xt 6= x for t = 2−m, · · · , 0.
m = 4
= x6= x6= x6= x= x
X2X1X0X−1X−2X−3
?
Definition (Average Repetition Time)
Let Qx (m) be the conditional probability that the repetition time of the xoccurring at t = 1 is m, hence
Qx (m)∆= Pr{X1−m = x ,X2−m 6= x , · · · ,X0 6= x |X1 = x}.
The average repetition time for symbol-value x with Pr{X1 = x} > 0 isnow defined as
T (x)∆=
∑m=1,2,···
mQx (m).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Kac’s Result
Example
Consider an IID (binary) process and assume that Pr{X1 = 1} = θ > 0.Then
Q1(m) = θ(1− θ)m−1 and
T (1) =∑
m=1,2,···mθ(1− θ)m−1 =
1
θ.
Theorem (Kac, 1947)
For stationary and ergodic processes, for any x with Pr{X1 = x} > 0,
T (x) =1
Pr{X1 = x}.
Note that Kac’s result holds also for “sliding” N-blocks, hence
T ((x1, x2, · · · , xN)) =1
Pr{(X1,X2, · · · ,XN) = (x1, x2, · · · , xN),
if Pr{(X1,X2, · · · ,XN) = (x1, x2, · · · , xN)} > 0. Now the repetition time isequal to m when m is the smallest positive integer such that(x1−m, x2−m, · · · , xN−m) = (x1, x2, · · · , xN).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Kac’s Result
Example
Consider an IID (binary) process and assume that Pr{X1 = 1} = θ > 0.Then
Q1(m) = θ(1− θ)m−1 and
T (1) =∑
m=1,2,···mθ(1− θ)m−1 =
1
θ.
Theorem (Kac, 1947)
For stationary and ergodic processes, for any x with Pr{X1 = x} > 0,
T (x) =1
Pr{X1 = x}.
Note that Kac’s result holds also for “sliding” N-blocks, hence
T ((x1, x2, · · · , xN)) =1
Pr{(X1,X2, · · · ,XN) = (x1, x2, · · · , xN),
if Pr{(X1,X2, · · · ,XN) = (x1, x2, · · · , xN)} > 0. Now the repetition time isequal to m when m is the smallest positive integer such that(x1−m, x2−m, · · · , xN−m) = (x1, x2, · · · , xN).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Kac’s Result
Example
Consider an IID (binary) process and assume that Pr{X1 = 1} = θ > 0.Then
Q1(m) = θ(1− θ)m−1 and
T (1) =∑
m=1,2,···mθ(1− θ)m−1 =
1
θ.
Theorem (Kac, 1947)
For stationary and ergodic processes, for any x with Pr{X1 = x} > 0,
T (x) =1
Pr{X1 = x}.
Note that Kac’s result holds also for “sliding” N-blocks, hence
T ((x1, x2, · · · , xN)) =1
Pr{(X1,X2, · · · ,XN) = (x1, x2, · · · , xN),
if Pr{(X1,X2, · · · ,XN) = (x1, x2, · · · , xN)} > 0. Now the repetition time isequal to m when m is the smallest positive integer such that(x1−m, x2−m, · · · , xN−m) = (x1, x2, · · · , xN).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Universal Source Coding Based on Repetition Times
Suppose that our source is binary i.e. Xt ∈ {0, 1} for all integer t.
x3
?m = 4
‖‖‖
x−7 x−6 x−5 x−4 x−3 x−2 x−1 x0 x1 x2
x1 x2
x3
A. The encoder wants to convey a source block xN1∆= (x1, x2, · · · , xN) to
the decoder. Both encoder and decoder have access to buffers containingall previous source symbols · · · , x−2, x−1, x0.
B. Using these previous source symbols the encoder can determine therepetition time m of xN1 . It is the smallest integer m that satisfies
xN−m1−m = xN1 ,
where xN−m1−m
∆= (x1−m, x2−m, · · · , xN−m).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Universal Source Coding Based on Repetition Times (cont.)
C. Repetition time m is now encoded and sent to the decoder. The codefor m consists of a preamble p(m) and an index i(m) and has length l(m).
Example
Code table for the waiting time m for N = 3:
m p(m) i(m) l(m)1 00 - 2+0=22 01 0 2+1=33 01 1 2+1=34 10 00 2+2=45 10 01 2+2=46 10 10 2+2=47 10 11 2+2=4≥ 8 11 copy of x1x2x3 2+3=5
In general there are N + 1 groups. There are index groups with 1, 2, up to2N−1 elements, hence the index lengths are 0, 1, up to N − 1. The lastgroup is the “copy”-group. A “copy” has length N. We use a preamblep(m) of dlog2(N + 1)e bits to specify one of these N + 1 alternatives.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Universal Source Coding Based on Repetition Times (cont.)
For arbitrary N we get for the code-block length l(m)
l(m) =
{dlog2(N + 1)e+ blog2 mc if m < 2N ,dlog2(N + 1)e+ N if m ≥ 2N .
This results in the upper bound
l(m) ≤ dlog2(N + 1)e+ log2 m.
D. After decoding m the decoder can reconstruct xN1 using the previous
source symbols in the buffer. With this block xN1 both the encoder anddecoder can update their buffers.
E. Then the next block
x2NN+1
∆= xN+1, xN+2, · · · , x2N
is processed in a similar way, etc.
Note:
Buffers need only contain the previous 2N − 1 source symbols!
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Universal Source Coding Based on Repetition Times (cont.)
For arbitrary N we get for the code-block length l(m)
l(m) =
{dlog2(N + 1)e+ blog2 mc if m < 2N ,dlog2(N + 1)e+ N if m ≥ 2N .
This results in the upper bound
l(m) ≤ dlog2(N + 1)e+ log2 m.
D. After decoding m the decoder can reconstruct xN1 using the previous
source symbols in the buffer. With this block xN1 both the encoder anddecoder can update their buffers.
E. Then the next block
x2NN+1
∆= xN+1, xN+2, · · · , x2N
is processed in a similar way, etc.
Note:
Buffers need only contain the previous 2N − 1 source symbols!
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Analysis of the Repetition-Time Algorithm
Assume that a certain xN1 occurred as first block. What is then the average
codeword length L(xN1 ) for xN1 ?
L(xN1 ) =∑
m=1,2,···QxN1
(m)l(m)
(a)
≤∑
m=1,2,···QxN1
(m)dlog2(N + 1)e+∑
m=1,2,···QxN1
(m) log2 m
(b)
≤ dlog2(N + 1)e+ log2
∑m=1,2,···
mQxN1(m)
(c)= dlog2(N + 1)e+ log2
1
Pr{XN1 = xN1 }
.
Here (a) follows from the upper bound for l(m), (b) from Jensen’sinequality. Furthermore (c) follows from Kac’s theorem.Ideal codeword length plus dlog2(N + 1)e.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Analysis of the Repetition-Time Algorithm (cont.)
The probability that xN1 occurs as block is Pr{XN1 = xN1 }. For the average
codeword length L(XN1 ) we therefore get
L(XN1 ) =
∑xN1
Pr{XN1 = xN1 }L(xN1 )
≤∑xN1
Pr{XN1 = xN1 }
(dlog2(N + 1)e+ log2
1
Pr{XN1 = xN1 }
)
= dlog2(N + 1)e+ H(XN1 ).
For the rate RN we now obtain
RN =L(XN
1 )
N≤
H(XN1 )
N+dlog2(N + 1)e
N.
Theorem (W., 1986, 1989)
The repetition-time algorithm achieves entropy since
limN→∞
RN = limN→∞
(H(XN
1 )
N+dlog2(N + 1)e
N
)= H∞(X ).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Analysis of the Repetition-Time Algorithm (cont.)
The probability that xN1 occurs as block is Pr{XN1 = xN1 }. For the average
codeword length L(XN1 ) we therefore get
L(XN1 ) =
∑xN1
Pr{XN1 = xN1 }L(xN1 )
≤∑xN1
Pr{XN1 = xN1 }
(dlog2(N + 1)e+ log2
1
Pr{XN1 = xN1 }
)
= dlog2(N + 1)e+ H(XN1 ).
For the rate RN we now obtain
RN =L(XN
1 )
N≤
H(XN1 )
N+dlog2(N + 1)e
N.
Theorem (W., 1986, 1989)
The repetition-time algorithm achieves entropy since
limN→∞
RN = limN→∞
(H(XN
1 )
N+dlog2(N + 1)e
N
)= H∞(X ).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Remarks: Repetition-Time Algorithm
Universal algorithm.
Assume that · · · ,X−1,X0,X1,X2, · · · is stationary and ergodic withentropy H∞(X ). Let the random variable M be the repetition time ofthe source block XN
1 .
Theorem (Wyner & Ziv, 1989)
Fix an ε > 0. Then
limN→∞
Pr{M ≥ 2N(H∞(X )+ε)
}= 0.
This results implies that the buffer can be much smaller than 2N − 1 if theentropy is known to be smaller than 1.This result was crucial in proving that the LZ77 algorithm achieves entropy(Wyner & Ziv [1994]).
Elias [1987], interval and recency-rank coding methods (symbols).
Hershkovitz and Ziv [1998] studied conditional repetition times.
When better than CTW?CTW incremental redundancy for an N-block is ≈ NK/(2B ln(2)) bitsfor K parameters. This redundancy is larger than log2(N + 1) for(K/2N) > 2 ln(N + 1)/N. For N = 24 we get K/2N > 0.2682.
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Outline
1 INTRODUCTION
2 HUFFMAN and TUNSTALLBinary IID SourcesHuffman CodeTunstall Code
3 ENUMERATIVE CODINGLexicographical OrderingFV: Pascal-∆ MethodVF: Petry Code
4 ARITHMETIC CODINGIntervalsUniversal Coding, Individual Redundancy
5 CONTEXT-TREE WEIGHTINGIID, unknown θBinary Tree-SourcesContext TreesCoding Probabilities
6 REPETITION TIMESLZ77Repetition Times, KacRepetition-Time AlgorithmAchieving Entropy
7 CONCLUSION
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
CONCLUSION
Recent developments:
DUDE (Weismann, Ordentlich, Serroussi, Verdu, Weinberger [2005]),resulted in study of bi-directional contexts and splitting rules(Ordentlich, Weinberger, and Weissman [2005]).
Directed Mutual Information (Marko [1973], Massey [1990]):
I (XN1 → Y N
1 ) =∑
H(Yn|Y n−11 )− H(Yn|Y n−1
1 ,X n−11 ,Xn) is a
generalisation of Granger causality [1969]. CTW-methods were usedto estimate these quantities (Liao, Permuter, Kim, Zhao, Kim, andWeissman [2012]).
Questions:
LZ learns from seeing once. CTW is optimal for tree sources but seemsto take more time. What are the algorithms between CTW and LZ?
Suppose that the data have left-right symmetry henceP(a, b) = P(b, a), P(a, b, c) = P(c, b, a), P(a, b, c, d) = P(d , c, b, a),etc. This reduces the number of parameters. Algorithm? Relevant forimage-compression.
CTW can handle side-information by considering it as context (e.g.Cai, Kulkarni and Verdu [2005]). But what if the side-information isnot-properly aligned? Relevant for reference-based genomecompression (Chern et al. [2012]).
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Source Coding is FUN!
(Ulm, 1997)