data compression information theory - homepage …meinsmag/onzin/shannon.pdfoutline 1. data...
TRANSCRIPT
![Page 1: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/1.jpg)
Data compression
&
Information theory
Gjerrit Meinsma
Mathematisch cafe 2-10-’03
![Page 2: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/2.jpg)
Claude Elwood Shannon (1916-2001)
Mathematisch cafe 2-10-’03 1
![Page 3: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/3.jpg)
Outline
1. Data compression
2. Universal compression algorithm
3. Mathematical model for ‘disorder’ (information & entropy)
4. Connection entropy and data compression
5. ‘Written languages have high redundancy’
(70% can be thrown away)
Mathematisch cafe 2-10-’03 2
![Page 4: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/4.jpg)
Data compression
For this LaTeX file:
# bytes after compression (gzip)
# bytes before compression= 0.295
Some thoughts:
1. Can this be beaten?
2. Is there a ‘fundamental compression ratio’?
Mathematisch cafe 2-10-’03 3
![Page 5: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/5.jpg)
Two extremes
Theorem 1. De optimal compression ratio for this file < 0.001.
Mathematisch cafe 2-10-’03 4
![Page 6: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/6.jpg)
Two extremes
Theorem 1. De optimal compression ratio for this file < 0.001.
....because my own compression algorithm gjerritzip assigns a
number to each file and this LATEX-file happens to get number 1.
Mathematisch cafe 2-10-’03 4
![Page 7: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/7.jpg)
Two extremes
Theorem 1. De optimal compression ratio for this file < 0.001.
....because my own compression algorithm gjerritzip assigns a
number to each file and this LATEX-file happens to get number 1.
gjerritzip probably doesn’t do too well on other files.
Mathematisch cafe 2-10-’03 4
![Page 8: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/8.jpg)
We want a universal compression algorithm
one such that for every file
# bytes after compression
# bytes before compression≤ α
witjh α > 0 as small as possible.
Mathematisch cafe 2-10-’03 5
![Page 9: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/9.jpg)
We want a universal compression algorithm
one such that for every file
# bytes after compression
# bytes before compression≤ α= 1
witjh α > 0 as small as possible.
Unfortunately:
Theorem 2.
There is no lossless compression that strictly reduces every file.
Mathematisch cafe 2-10-’03 5
![Page 10: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/10.jpg)
Proof.
Consider the two one-bit files ‘0’ en ‘1’.....
Mathematisch cafe 2-10-’03 6
![Page 11: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/11.jpg)
Proof.
Consider the two one-bit files ‘0’ en ‘1’.....
More convincing:
There are 2N files of N bits.
There are
2N−1 + 2N−2 + · · · 20 = 2N−1
files of less than N bits.
Funny: patents have been granted to universal compression algorithms,
which we know don’t exist.
Mathematisch cafe 2-10-’03 6
![Page 12: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/12.jpg)
Exploiting structure
Why do gzip, winzip work in practice?
Mathematisch cafe 2-10-’03 7
![Page 13: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/13.jpg)
Exploiting structure
Why do gzip, winzip work in practice?
Because computer files have a structure.
gzip works for many, many files, but not for every file.
Mathematisch cafe 2-10-’03 7
![Page 14: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/14.jpg)
Exploiting structure
Why do gzip, winzip work in practice?
Because computer files have a structure.
gzip works for many, many files, but not for every file.
...it’s time to define ‘structure’ mathematically
...the notion of information & entropy
Mathematisch cafe 2-10-’03 7
![Page 15: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/15.jpg)
Towards a definition of information & entropy
Compare
‘I will eat something today’
with
‘Balkenende has a lover’
The second one is more surprising, supplies more information.
• Information is a measure of surprise
• (Entropy is a measure of disorder)
Mathematisch cafe 2-10-’03 8
![Page 16: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/16.jpg)
What do we want ‘information’ to be?
Mathematisch cafe 2-10-’03 9
![Page 17: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/17.jpg)
Example 3. A draw a card. Then I tell you:
Mathematisch cafe 2-10-’03 10
![Page 18: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/18.jpg)
Example 3. A draw a card. Then I tell you:
• ‘It is a spade’ [I supply information]
Mathematisch cafe 2-10-’03 10
![Page 19: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/19.jpg)
Example 3. A draw a card. Then I tell you:
• ‘It is a spade’ [I supply information]
• ‘It is an ace’ [I supply information once again]
Mathematisch cafe 2-10-’03 10
![Page 20: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/20.jpg)
Example 3. A draw a card. Then I tell you:
• ‘It is a spade’ [I supply information]
• ‘It is an ace’ [I supply information once again]
It seems reasonable to demand that
I(ace of spade) = I(spade) + I(ace)
That is to say:
I(A ∩ B) = I(A) + I(B) if A and B are indepedent
Mathematisch cafe 2-10-’03 10
![Page 21: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/21.jpg)
Axiom 4 (Version 1).
1. I(A ∩ B) = I(A) + I(B) if A and B independent
2. I(A) ≥ 0
3. I(A) = I(B) if P (A) = P (B)
Because of the last axiom, information is a function of probability:
Axiom 5 (Version 2). For all p, q ∈ (0, 1):
1. I(pq) = I(p) + I(q)
2. I(p) ≥ 0
3. and we add anoter: I(p) is continuous in p.
Mathematisch cafe 2-10-’03 11
![Page 22: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/22.jpg)
Theorem 6.
Then I is unique: I(p) = −k log(p) modulo scaling k > 0.
Other scaling factor k means other unit (irrelevant).
From now on:
I(p) = − log2(p)
Mathematisch cafe 2-10-’03 12
![Page 23: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/23.jpg)
Example 7 (Special cases).
0 1
• I(0) = +∞, makes sense
• I(1) = 0, makes sense
• I(2−k) = k, well...
Mathematisch cafe 2-10-’03 13
![Page 24: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/24.jpg)
Entropy = expectation of information
Definition 8. Entropy H := E(I)
Mathematisch cafe 2-10-’03 14
![Page 25: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/25.jpg)
Entropy = expectation of information
Definition 8. Entropy H := E(I)
Example 9 (Connection entropy and structure).
Consider the file
aaaaabaabbaaaaababaaaaabaaaaaabaabaaab....
and assume
• P (a) = p
• P (b) = 1 − p
Mathematisch cafe 2-10-’03 14
![Page 26: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/26.jpg)
then entropy per symbol is
H = E(I) = p I(p) + (1 − p) I(1 − p)
p = 0 p = 1
1
Makes sense:
p ≈ 1 : aaaaabaaaaaabaaaa little surprise
p ≈ 0 : bbbbbbbabbbbbbab little surprise
p = 1/2 : abaabbabaabbaaba maximal disorder (symmetry)
Mathematisch cafe 2-10-’03 15
![Page 27: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/27.jpg)
Example 10 (The abcd-file).
Consider file with symbols ‘a’, ‘b’, ‘c’ en ‘d’:
bdaccdaadaabbcaaabbaaabaaacbdaab.....
and suppose that
P (a) = 1/2
P (b) = 1/4
P (c) = 1/8
P (d) = 1/8
Mathematisch cafe 2-10-’03 16
![Page 28: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/28.jpg)
Its entropy (per symbol) is
H =∑
i
pi I(pi)
=1
2I(2−1) +
1
4I(2−2) + 2
1
8I(2−3)
=1
2(1) +
1
4(2) + 2
1
8(3)
=1
2+
1
2+
3
4
= 1.75
Mathematisch cafe 2-10-’03 17
![Page 29: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/29.jpg)
Back to compression
Example 11 (The abcd-file). Consider again
bdaccdaadaabbcaaabbaaabaaacbdaab.....
with again
P (a) = 1/2
P (b) = 1/4
P (c) = 1/8
P (d) = 1/8
This we want to code (binary)
Mathematisch cafe 2-10-’03 18
![Page 30: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/30.jpg)
This is an obvious coding:
a ↔ 00code word
b ↔ 01
c ↔ 10
d ↔ 11
For example:
abba ↔ 00010100
This coding is decodable
Mathematisch cafe 2-10-’03 19
![Page 31: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/31.jpg)
As all code words ∈ {00, 01, 10, 11} consist of two bits:
(average) # bits per symbol = 2
Mathematisch cafe 2-10-’03 20
![Page 32: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/32.jpg)
As all code words ∈ {00, 01, 10, 11} consist of two bits:
(average) # bits per symbol = 2
Can this be done more efficiently?
Mathematisch cafe 2-10-’03 20
![Page 33: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/33.jpg)
Yes:a ↔ 0 with probability 1/2
b ↔ 10 with probability 1/4
c ↔ 110 with probability 1/8
d ↔ 111 with probability 1/8
Symbols that appear more often have shorter code words
Mathematisch cafe 2-10-’03 21
![Page 34: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/34.jpg)
Yes:a ↔ 0 with probability 1/2
b ↔ 10 with probability 1/4
c ↔ 110 with probability 1/8
d ↔ 111 with probability 1/8
Symbols that appear more often have shorter code words
average # bits per symbol =1
2(1) +
1
4(2) + 2
1
8(3)
= 1.75
Mathematisch cafe 2-10-’03 21
![Page 35: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/35.jpg)
Can this be done still more efficiently?
Mathematisch cafe 2-10-’03 22
![Page 36: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/36.jpg)
Can this be done still more efficiently?
Shannon (1948): ‘No, cause entropy of the source is 1.75’
Mathematisch cafe 2-10-’03 22
![Page 37: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/37.jpg)
Theorem 12.
H ≤ infcodings
E(L)
# bits per
symbol
≤ H + 1.
Mathematisch cafe 2-10-’03 23
![Page 38: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/38.jpg)
Example 13 (terrible coding). Suppose our file is
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
with P (a) = 1. This file has entropy (per symbol)
H = 0.
and for the ‘optimal’ coding
a ↔ 0
we have
E(L) = 1 = H + 1
Mathematisch cafe 2-10-’03 24
![Page 39: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/39.jpg)
Example 13 (terrible coding). Suppose our file is
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
with P (a) = 1. This file has entropy (per symbol)
H = 0.
and for the ‘optimal’ coding
a ↔ 0
we have
E(L) = 1 = H + 1
Message: Coding per symbol is not a good idea.
Mathematisch cafe 2-10-’03 24
![Page 40: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/40.jpg)
Coding sequence of symbols
Not allowed was the coding
aaaaaaaaaaaaaaaaaaaaaaaaa ↔ 25 a’s ↔ ....
Pity, but not too bad:
Mathematisch cafe 2-10-’03 25
![Page 41: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/41.jpg)
We may consider
absdfuhquwyhgvsdfvsefqwsddfhwdqqwsd
as a sequence of ‘letters’ (symbols)
but also as a sequence of N -letters (symbols)
absdfuhquwyhgvsdfvsefqwsddfhwdqqwsd
This requires coding of many ‘symbols’
absdf ↔ 0110101000... ↔ ...
Mathematisch cafe 2-10-’03 26
![Page 42: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/42.jpg)
Entropy per N -letter symbol is
HN = NH
Mathematisch cafe 2-10-’03 27
![Page 43: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/43.jpg)
So Shannon says:
NH ≤ inf E(L
# bits per symbol (N -letters)
) ≤ NH + 1
that is
H ≤ inf E(L/N) ≤ H + 1/N
Beautiful:
Theorem 14.
infcodings
E(# bits per letter) = H
Mathematisch cafe 2-10-’03 28
![Page 44: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/44.jpg)
Entropy of languages—redundancy
Written languages have high redundancy:
Aoccdrnig to rscheearch at an Elingsh uinervtisy, it deosn’t
mttaer in waht oredr the ltteers in a wrod are, the olny
iprmoetnt tihng is taht the frist and lsat ltteer is at the rghit
pclae.
Languages are highly structured.
It doesn’t seem to have high entropy.
Mathematisch cafe 2-10-’03 29
![Page 45: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/45.jpg)
How to determine the entropy of ‘English’
Silly approach:
P (a) = P (b) = · · · = P (z) = P ( ) =1
27=⇒ H ≈ 4.8
Mathematisch cafe 2-10-’03 30
![Page 46: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/46.jpg)
First-order approximation:
P (a) = 0.0575
P (b) = 0.0128... ...
P (e) = 0.0913... ...
P (z) = 0.007
then
H ≈ 4.1
Mathematisch cafe 2-10-’03 31
![Page 47: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/47.jpg)
Higher-order approximation:
P (be | to be or not to) = ??... ...
Then
H =?
Mathematisch cafe 2-10-’03 32
![Page 48: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/48.jpg)
It is believed that
0.6 < HEnglish < 1.3
that is: about 1 bit per letter is enough!
Computers represent letters in ASCII (8-bits)
so this compression ratio should be possible (in theory):
HEnglish
8<
1.3
8≈
1
6
In other words:
Perfect zippers compress ‘English’ with a factor of 6.
Mathematisch cafe 2-10-’03 33
![Page 49: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/49.jpg)
Proofs
Lemma 15. For every decodable coding
∑
i
2−li ≤ 1
with li the length of code word number i.
Mathematisch cafe 2-10-’03 34
![Page 50: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/50.jpg)
Proof.
N︷ ︸︸ ︷aaaaaaaaa ↔
∑Nk=1 lik
︷ ︸︸ ︷1010001011
baaaaaaaa ↔ 0110110001111... ... ...
zzzzzzzzzz ↔ 011011010001111
Decodable implies
# codewords of length l :=N∑
k=1
lik ≤ 2l
Mathematisch cafe 2-10-’03 35
![Page 51: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/51.jpg)
SN := (∑
i
2−li)N =∑
i1,i2,...,iN
2−(li1+···+liN)
then
SN =∑
l
2−lAl, Al := # of length l
Then Al ≤ 2l because uniquely decodable. So
SN =∑
l
2−lAl ≤∑
l
2−l2l = Nlmax
Therefore
(∑
i
2−li)N ≤ Nlmax
left-hand side is exponential in N , right-hand side linear.
Hence base of exponent ≤ 1.
Mathematisch cafe 2-10-’03 36
![Page 52: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/52.jpg)
Lemma 16. If li such that
∑
i
2−li ≤ 1
then there is a (direct) decodable coding.
Mathematisch cafe 2-10-’03 37
![Page 53: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/53.jpg)
Proof. Construct a tree with those lengths
(suppose lengths are {1, 2, 3, 3})
PSfrag replacements
0 1 2 3
Mathematisch cafe 2-10-’03 38
![Page 54: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/54.jpg)
0
10
110
111
1
11
PSfrag replacements
1 12
14
18
This one is decodable.
Mathematisch cafe 2-10-’03 39
![Page 55: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/55.jpg)
Proof that H ≤ E(L).
E(L) − H =∑
i
pili +∑
i
pi log2(pi)
=∑
i
pi log2(2lipi)
=∑
i
pi
ln(2lipi)
ln(2)
=1
ln(2)
∑
i
pi ln(2lipi)
≥1
ln(2)
∑
i
pi(1 − 2−li/pi)
ln(x) ≥ 1 − 1x
=1
ln(2)
∑
i
pi − 2−li ≥ 0.
Mathematisch cafe 2-10-’03 40
![Page 56: Data compression Information theory - Homepage …meinsmag/onzin/shannon.pdfOutline 1. Data compression 2. Universal compression algorithm 3. Mathematical model for ‘disorder’](https://reader034.vdocuments.net/reader034/viewer/2022042511/5ab494a97f8b9a1a048c11e7/html5/thumbnails/56.jpg)
Proof of
inf E(L) ≤ H + 1
is constructive: take
li = d− log2(pi)e
then∑
i
2−li ≤∑
pi = 1
so a decodable coding exists.
We have
E(L) =∑
pili <∑
pi(− log2(pi) + 1) = H + 1.
Mathematisch cafe 2-10-’03 41