mat did 033576

Upload: mohit-kumar

Post on 01-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Mat Did 033576

    1/24

     

    Data Compression

  • 8/9/2019 Mat Did 033576

    2/24

  • 8/9/2019 Mat Did 033576

    3/24

     

    !xamples of Codes

    Defnition " source code C for a random variable X is a mappingfrom X , the range of X , to D#, the set of nite$length strings ofsymbols from a D$ary alphabet. %et C(x) denote the codewordcorresponding to x and let l(x) denote the length of C(x).

    &or example, C( red ) ' ((, C( blue ) ' )) is a source code for X ' *red,blue+ with alphabet D ' *(, )+.

    Defnition he expected length L(C) of a source code C(x) for arandom

     variable X with probability mass function p(x) is given by-

    where l(x) is the length of the codeword associated with x. Withoutloss of generality, we can assume that the D$ary alphabet is D ' *(,

    ), . . . , D )+.

    ∑∈

    = χ  x

     xl  x pC  L   )()()(

  • 8/9/2019 Mat Did 033576

    4/24

     

    !xample )

    %et X be a random variable with the following distributionand codeword assignment-

    /r(X ' ) ) ' )01, codeword C( ) ) ' (

    /r(X ' 1 ) ' )02, codeword C( 1 ) ' )(

    /r(X ' 3 ) ' )04, codeword C( 3 ) ' ))(/r(X ' 2 ) ' )04, codeword C( 2 ) ' ))).

    he entropy H(X) of X is ).56 bits, and the expected length L(C) ' El(X) of this code is also ).56 bits. 7ere we have acode that has the same average length as the entropy. Wenote that any sequence of bits can be uniquely decoded into asequence of symbols of X . &or example, the bit string())())))(())( is decoded as )321)3.

  • 8/9/2019 Mat Did 033576

    5/24

  • 8/9/2019 Mat Did 033576

    6/24

  • 8/9/2019 Mat Did 033576

    7/24 

    onsingularity

    We now dene increasingly more stringent conditions oncodes. %et xn denote (x), x1, . . . , xn ).

    Defnition  " code is said to be nonsingular if every element

    of the range of X maps into a di:erent string in D#E that is, x1 ≠ x 2 F C(x1 ) ≠ C(x 2 ). onsingularity su:ices for an

    unambiguous description of a single value of X . Gut weusually wish to send a sequence of values of  X . An such caseswe can ensure decodability by adding a special symbol

  • 8/9/2019 Mat Did 033576

    8/24

  • 8/9/2019 Mat Did 033576

    9/24 

    Inique Decodability

    Defnition  " code is called uniuel! decoda"le if its extension isnonsingular. An other words, any encoded string in a uniquelydecodable code has only one possible source string producing it.7owever, one may have to loo at the entire string to determine eventhe rst symbol in the corresponding source string.

     Defnition " code is called a pre#x code or an instantaneous code ifno codeword is a prex of any other codeword.

     "n instantaneous code can be decoded without reference to futurecodewords since the end of a codeword is immediately recogniBable.7ence, for an instantaneous code, the symbol xi can be decoded assoon as we come to the end of the codeword corresponding to it. Weneed not wait to see the codewords that come later. "n instantaneouscode is a sel$punctuating codeE we can loo down the sequence ofcode symbols and add the commas to separate the codewords withoutlooing at later symbols. &or example, the binary string ()()))))()(produced by the code of !xample ) is parsed as (,)(,))),))(,)(.

  • 8/9/2019 Mat Did 033576

    10/24

     

    he nesting of these denitions is shown in &igure. oillustrate the di:erences between the various inds of codes,consider the examples of codeword assignments C(x) to x J Xin able.

    &or the nonsingular code, the codestring ()( has three possible sourcesequences- 1 or )2 or 3), and hencethe code is not uniquely decodable.

  • 8/9/2019 Mat Did 033576

    11/24

     

    Comments

    o see that it is uniquely decodable, tae any code string andstart from the beginning. Af the rst two bits are (( or )(, theycan be decoded immediately. Af the rst two bits are )), wemust loo at the following bits. Af the next bit is a ), the rstsource symbol is a 3. Af the length of the string of (s

    immediately following the )) is odd, the rst codeword mustbe ))( and the rst source symbol must be 2E if the length ofthe string of (s is even, the rst source symbol is a 3. Gyrepeating this argument, we can see that this code is uniquelydecodable. ;ardinas and /atterson have devised a nite test

    for unique decodability, which involves forming sets of possiblesu:ixes to the codewords and eliminating them systematically.he fact that the last code in the able is instantaneous isobvious since no codeword is a prex of any other.

  • 8/9/2019 Mat Did 033576

    12/24

     

    Kraft Anequality

    We wish to construct instantaneous codes of minimum expectedlength to describe a given source. At is clear that we cannotassign short codewords to all source symbols and still be prex$free. he set of codeword lengths possible for instantaneouscodes is limited by the following inequality.

    Theorem

  • 8/9/2019 Mat Did 033576

    13/24

     

    /roof of Kraft Anequality

    Proo: Consider a D$ary tree in which each node has D children. %et thebranches of the tree represent the symbols of the codeword. &orexample, the D branches arising from the root node represent the Dpossible values of the rst symbol of the codeword. hen each codewordis represented by a leaf on the tree. he path from the root traces outthe symbols of the codeword. " binary example of such a tree is shownin &igure. he prex condition on the codewords implies that nocodeword is an ancestor of any other codeword on the tree. 7ence, eachcodeword eliminates its descendants as possible codewords.

  • 8/9/2019 Mat Did 033576

    14/24

     

    /roof of Kraft Anequality

    %et lmax be the length of the longest codeword of the set of

    codewords. Consider all nodes of the tree at level lmax. ;ome

    of them are codewords, some are descendants of codewords,and some are neither. " codeword at level li has Dlmaxli 

    descendants at levellmax. !ach of these descendant sets mustbe disLoint. "lso, the total number of nodes in these sets must

    be less than or equal to Dlmax . 7ence, summing over all the

    codewords, we have-

    Which is the Kraft inequality.

    maxmax   l 

    i

    l l  D D   i ≤∑

      −

  • 8/9/2019 Mat Did 033576

    15/24

     

    !xtended Kraft Anequality

    Conversely, given any set of codeword lengths l), l1, . . . , l& 

    that satisfy the Kraft inequality, we can always construct atree lie the one in &igure. %abel the rst node

  • 8/9/2019 Mat Did 033576

    16/24

     

    !xtended Kraft Anequality

    Theorem

  • 8/9/2019 Mat Did 033576

    17/24

     

    /roof of !xtended KraftAnequality

    his codeword corresponds to the interval-

    the set of all real numbers whose D$ary expansion begins with (.! ) ! 1 H H H

     ! li . his is a subinterval of the unit interval M(, )N. Gy the prex condition,

    these intervals are disLoint. 7ence, the sum of their lengths has to be lessthan or equal to ). his proves that

     Oust as in the nite case, we can reverse the proof to construct the codefor a given l), l1, . . . that satises the Kraft inequality. &irst, reorder the

    indexing so that l) P l1 P . . . . hen simply assign the intervals in order

    from the low end of the unit interval. &or example, if we wish to constructa binary code with l) ' ), l1 ' 1, . . . , we assign the intervals M(, )01 ), M)01,

    )02 ), . . . to the symbols, with corresponding codewords (, )(,. . . .

    )1

    ....0,....0[ 2121iii   l l l   D

     y y y y y y   +

    11

    ≤∑∞

    =

    i

    l i D

  • 8/9/2019 Mat Did 033576

    18/24

     

    Qptimal Codes

    We proved that any codeword set that satises the prexcondition has to satisfy the Kraft inequality and that the Kraftinequality is a su:icient condition for the existence of acodeword set with the specied set of codeword lengths. Wenow consider the problem of nding the prex code with the

    minimum expected length.

    his is equivalent to nding the set of lengths l), l1, . . . , l& 

    satisfying the Kraft inequality and whose expected length %'is less than the expected length of any other prex code. his

    is a standard optimiBation problem.9inimiBe-

    over all integers l), l1, . . . , l& satisfying

    ∑   ii pl 

    ∑=   ii pl  L

    ∑   ≤− 1il  D

  • 8/9/2019 Mat Did 033576

    19/24

     

    Qptimal Codes

    We neglect the integer constraint on li and assume equality in

    the constraint. 7ence, we can write the constrainedminimiBation using %agrange multipliers as the minimiBation of 

    Di:erentiating with respect to li , we obtain

    ;etting the derivative to (, we obtain

    ;ubstituting this in the constraint to nd , we nd  ' )0

  • 8/9/2019 Mat Did 033576

    20/24

     

    Qptimal Code %ength

    his noninteger choice of codeword lengths yields expectedcodeword length

    Gut since the li must be integers, we will not always be able to set

    the codeword lengths in this way. Anstead, we should choose a setof codeword lengths li ?close@ to the optimal set. We verify that l#i' log D pi is a global minimum by the proof of the following

    theorem.

    Theorem he expected length % of any instantaneous D$ary code

    for a random variable R is greater than or equal to the entropy7D

  • 8/9/2019 Mat Did 033576

    21/24

     

    !xpected Code %ength

    Proo: We can write the di:erence between the expectedlength and the

    entropy as

    %etting ri' and we obtain-

    by the nonnegativity of relative entropy and the fact

  • 8/9/2019 Mat Did 033576

    22/24

     

    !xpected Code %ength

    Defnition  " probability distribution is called Dadic if each of theprobabilities is equal to Dn for some n. hus, we have equality inthe theorem if and only if the distribution of X is D$adic.

    he preceding proof also indicates a procedure for nding anoptimal code- &ind the D$adic distribution that is closest

  • 8/9/2019 Mat Did 033576

    23/24

  • 8/9/2019 Mat Did 033576

    24/24

    Gounds on the Qptimal Code%ength

    where the quantity x is the smallest integer S x. heselengths satisfy the Kraft

    inequality since

    &rom the choice of li, we can see that these codeword lengths

    satisfy

    9ultiplying by pi and summing over i, we obtain

    1

    1log

    1log

    ==≤   ∑∑∑  −

    i

     p p p D D   ii

    11

    log1

    log   +≤≤i

     Di

    i

     D p

    l  p

    1)()(   +