source coding ompression

Upload: gyubeom-choi

Post on 03-Jun-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Source Coding ompression

    1/34

    Source Coding-Compression

    Most Topics from Digital Communications-

    Simon Haykin

    Chapter 9

    9.1~9.4

  • 8/12/2019 Source Coding ompression

    2/34

    Fundamental Limits on Performance

    Given an information source, and a noisy channel

    1) Limit on the minimum number of bits

    per symbol

    2) Limit on the maximum rate for reliable

    communication

    Shannons theorems

  • 8/12/2019 Source Coding ompression

    3/34

    Information Theory

    Let the source alphabet,

    with the prob. of occurrence

    Assume the discrete memory-less source (DMS)

    What is the measure of information?

    0, 1 -1{ , .. , }KS s s s

    -1

    0

    0,1, .. , -1( ) , 1K

    k k k

    k

    k KP s s p and p

  • 8/12/2019 Source Coding ompression

    4/34

    Uncertainty, Information, and Entropy

    (cont)

    Interrelations between info., uncertainty or surprise

    No surprise no information

    If A is a surprise and B is another surprise,

    then what is the total info. of simultaneous A and B

    The amount of info may be related to the inverse of

    the prob. of occurrence.

    1( . )Pr .Info ob

    .( ) .( ) .( )Info A B Info A Info B

    1

    ( ) log( )kk

    I S p

  • 8/12/2019 Source Coding ompression

    5/34

    Property of Information

    1)

    2)

    3)

    4)

    * Custom is to use logarithm of base 2

    k k(s ) 0 for p 1I

    k( ) 0 for 0 p 1kI s

    k i( ) ( ) for p p

    k iI s I s

    indep.statist.sandsif),()()( ikikik sIsIssI

  • 8/12/2019 Source Coding ompression

    6/34

  • 8/12/2019 Source Coding ompression

    7/34

  • 8/12/2019 Source Coding ompression

    8/34

    Average Length

    For a code Cwith associated probabilitiesp(c)the average

    length is defined as

    We say that a prefix code Cis optimalif for all prefix

    codes C, la(C)la(C)

    l C p c l cac C

    ( ) ( ) ( )

  • 8/12/2019 Source Coding ompression

    9/34

    Relationship to Entropy

    Theorem (lower bound): For any probability

    distribution p(S) with associated uniquely decodable

    code C,

    Theorem (upper bound): For any probability

    distribution p(S) with associated optimal prefix code

    C,

    H S l Ca( ) ( )

    l C H S a ( ) ( ) 1

  • 8/12/2019 Source Coding ompression

    10/34

    Coding Efficiency

    Coding Efficiency

    n = Lmin/La

    where La is the average code-word length

    From Shannons Theorem La >= H(S)

    Thus Lmin = H(S)

    Thus n = H(S)/La

  • 8/12/2019 Source Coding ompression

    11/34

    Kraft McMillan Inequality

    Theorem (Kraft-McMillan): For any uniquely decodable codeC,

    Also, for any set of lengths L such that

    there is a prefix code C such that

    NOTE: Kraft McMillan Inequality does not tell uswhether the code is prefix-free or not

    2 1

    l cc C

    ( )

    2 1

    l

    l L

    l c l i Li i( ) ( ,...,| |) 1

  • 8/12/2019 Source Coding ompression

    12/34

  • 8/12/2019 Source Coding ompression

    13/34

    Prefix Codes

    Aprefix codeis a variable length code in which nocodeword is a prefix of another word

    e.g a = 0, b = 110, c = 111, d = 10

    Can be viewed as a binary tree with message values at theleaves and 0 or 1s on the edges.

    a

    b c

    d

    0

    0

    0 1

    1

    1

  • 8/12/2019 Source Coding ompression

    14/34

    Some Prefix Codes for Integers

    n Binary Unary Split

    1 ..001 0 1|

    2 ..010 10 10|0

    3 ..011 110 10|1

    4 ..100 1110 110|00

    5 ..101 11110 110|01

    6 ..110 111110 110|10

    Many other fixed prefix codes:

    Golomb, phased-binary, subexponential, ...

  • 8/12/2019 Source Coding ompression

    15/34

    Data compression implies sending or storing a

    smaller number of bits. Although many methods are

    used for this purpose, in general these methods can

    be divided into two broad categories: lossless and

    lossymethods.

    Data compression methods

  • 8/12/2019 Source Coding ompression

    16/34

    Run Length Coding

  • 8/12/2019 Source Coding ompression

    17/34

    IntroductionWhat is RLE?

    Compression technique Represents data using value and run length

    Run length defined as number of consecutive equal values

    e.g

    1110011111 1 30 21 5RLE

    Values Run Lengths

  • 8/12/2019 Source Coding ompression

    18/34

    Introduction

    Compression effectiveness depends on input

    Must have consecutive runs of values in order to maximize

    compression

    Best case: all values same

    Can represent any length using two values

    Worst case: no repeating values

    Compressed data twice the length of original!!

    Should only be used in situations where we know for sure have

    repeating values

  • 8/12/2019 Source Coding ompression

    19/34

    Run-length encoding example

  • 8/12/2019 Source Coding ompression

    20/34

    Run-length encoding for two symbols

  • 8/12/2019 Source Coding ompression

    21/34

    EncoderResults

    Input: 4,5,5,2,7,3,6,9,9,10,10,10,10,10,10,0,0

    Output: 4,1,5,2,2,1,7,1,3,1,6,1,9,2,10,6,0,2,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1

    Best Case:

    Input: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

    Output: 0,16,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1

    Worst Case:

    Input: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15

    Output: 0,1,1,1,2,1,3,1,4,1,5,1,6,1,7,1,8,1,9,1,10,1,11,1,12,1,13,1,14,1,15,1

    Valid OutputOutput Ends Here

  • 8/12/2019 Source Coding ompression

    22/34

    Huffman Coding

  • 8/12/2019 Source Coding ompression

    23/34

  • 8/12/2019 Source Coding ompression

    24/34

    Huffman Codes

    Huffman Algorithm

    Start with a forest of trees each consisting of a single

    vertex corresponding to a message s and with weight

    p(s) Repeat:

    Select two trees with minimum weight rootsp1andp2

    Join into single tree by adding root with weightp1

    + p2

  • 8/12/2019 Source Coding ompression

    25/34

    Example

    p(a) = .1, p(b) = .2, p(c ) = .2, p(d) = .5

    a(.1) b(.2) d(.5)c(.2)

    a(.1) b(.2)

    (.3)

    a(.1) b(.2)

    (.3) c(.2)

    a(.1) b(.2)

    (.3) c(.2)

    (.5)

    (.5) d(.5)

    (1.0)

    a=000, b=001, c=01, d=1

    0

    0

    0

    1

    1

    1Step 1

    Step 2Step 3

  • 8/12/2019 Source Coding ompression

    26/34

    Encoding and Decoding

    Encoding: Start at leaf of Huffman tree and follow path

    to the root. Reverse order of bits and send.

    Decoding: Start at root of Huffman tree and take branch

    for each bit received. When at leaf can output messageand return to root.

    a(.1) b(.2)

    (.3) c(.2)

    (.5) d(.5)

    (1.0)0

    0

    0

    1

    1

    1

    There are even faster methods that

    can process 8 or 32 bits at a time

  • 8/12/2019 Source Coding ompression

    27/34

    Huffman codes Pros & Cons

    Pros:

    The Huffman algorithm generates an optimal prefix code.

    Cons: If the ensemble changesthe frequencies and probabilities change

    the optimal coding changes

    e.g. in text compression symbol frequencies vary with context

    Re-computing the Huffman code by running through the entire file in

    advance?!

    Saving/ transmitting the code too?!

  • 8/12/2019 Source Coding ompression

    28/34

  • 8/12/2019 Source Coding ompression

    29/34

    Lempel-Ziv Algorithms

    LZ77(Sliding Window)

    Variants: LZSS (Lempel-Ziv-Storer-Szymanski)

    Applications: gzip, Squeeze, LHA, PKZIP, ZOO

    LZ78(Dictionary Based) Variants: LZW (Lempel-Ziv-Welch),

    LZC (Lempel-Ziv-Compress)

    Applications:compress, GIF, CCITT (modems), ARC, PAK

    Traditionally LZ77 was better but slower, but the gzip version isalmost as fast as any LZ78.

    L l Zi di

  • 8/12/2019 Source Coding ompression

    30/34

    Lempel Ziv encoding

    Lempel Ziv (LZ) encoding is an example of a

    category of algorithms called dictionary-basedencoding. The idea is to create a dictionary (a table)

    of strings used during the communication session. If

    both the sender and the receiver have a copy of the

    dictionary, then previously-encountered strings canbe substituted by their index in the dictionary to

    reduce the amount of information transmitted.

    C i

  • 8/12/2019 Source Coding ompression

    31/34

    Compression

    In this phase there are two concurrent events:

    building an indexed dictionary and compressing a

    string of symbols. The algorithm extracts the smallest

    substring that cannot be found in the dictionary from

    the remaining uncompressed string. It then stores a

    copy of this substring in the dictionary as a new entryand assigns it an index value. Compression occurs

    when the substring, except for the last character, is

    replaced with the index found in the dictionary. Theprocess then inserts the index and the last character

    of the substring into the compressed string.

  • 8/12/2019 Source Coding ompression

    32/34

    An example of Lempel Ziv encoding

    D i

  • 8/12/2019 Source Coding ompression

    33/34

    Decompression

    Decompression is the inverse of the compression

    process. The process extracts the substrings from the

    compressed string and tries to replace the indexes

    with the corresponding entry in the dictionary, which

    is empty at first and built up gradually. The idea is

    that when an index is received, there is already anentry in the dictionary corresponding to that index.

  • 8/12/2019 Source Coding ompression

    34/34

    An example of Lempel Ziv decoding