lecture 2-source coding
TRANSCRIPT
-
7/27/2019 Lecture 2-Source Coding
1/53
SourceCoding
-
7/27/2019 Lecture 2-Source Coding
2/53
Whatissourcecoding?
Communicationsystemsaredesignedtotransmittheinformationgeneratedbyasourcetosome
destination Adigitalcommunicationsystemisdesignedto
transmitinformationindigitalform
Sourcecoding:theconversionofthesourceoutputtoadigitalform,performedbyasourceencoder,whichproducesasequenceofbinarydigits
Designgoal:representasourcewiththefewestbitssuch
that
best
recovery
of
the
source
from
the
compresseddataispossible
ELEC5360 2
-
7/27/2019 Lecture 2-Source Coding
3/53
ELEC5360 3
Source
Encoder
Information
Source
Channel
EncoderModulator
ChannelNoise
Source
Decoder
Received
Information
Channel
DecoderDemodulator
Binaryinterface
-
7/27/2019 Lecture 2-Source Coding
4/53
Differentinformationsources
Discretesources
Asequenceofsymbolsfromaknowndiscretealphabet
Thealphabet
could
be,
e.g.,
binary
digits,
English
letters
Analogsequencesources
Asequenceofreal/complexnumbers
Analogwaveform
sources
Ananalogwaveform,e.g.,voicesignal,videowaveform
ELEC5360 4
-
7/27/2019 Lecture 2-Source Coding
5/53
Differentinformationsources
Sourcecodingforadiscretesource
Theinformationsourcecanbeuniquelyretrievedfromthe
encodedstring
of
bits
Uniquelydecodable Losslesscoding
Sourcecoding
for
analog
sources
Someofquantizationisnecessary
Itintroducesdistortion
Lossycompression Atradeoffbetweenthebitrateandthedistortion
ELEC5360 5
-
7/27/2019 Lecture 2-Source Coding
6/53
AGeneralDiagramforSourceCoding
SamplerInput
waveformQuantizer
Discrete
encoder
BinaryChannel
AnalogfilterOutput
waveform
TablelookupDiscrete
decoder
Binary
interface
Analog
sequence
Symbol
sequence
ELEC5360 6
Lossless
coding
Lossy
compression
-
7/27/2019 Lecture 2-Source Coding
7/53
APracticalExample:PCM
Pulsecodemodulation(PCM)
Digitalrepresentationofananalogsignal
Thestandard
form
for
digital
audio
and
various
Blu
ray,
DVD
and
CD
formats
Operation
Samplethemagnitudeoftheanalogsignalregularlyatuniform
intervals Eachsampleisthenquantizedtothenearestvaluewithinarangeof
digitalsteps
ThePCMprocessiscommonlyimplementedonasingle
integratedcircuit
generally
referred
to
as
an
analog
to
digital
converter (ADC).
ELEC5360 7
-
7/27/2019 Lecture 2-Source Coding
8/53
PCM
Anexample:
In
telephony,
astandard
audio
signal
for
asingle
phone
call
is
encodedas8,000analogsamplespersecond,of8bitseach,
givinga64kbit/sdigitalsignal
ELEC5360 8
-
7/27/2019 Lecture 2-Source Coding
9/53
SamplerInput
waveformQuantizer
Discrete
encoder
BinaryChannel
AnalogfilterOutput
waveform
TablelookupDiscrete
decoder
Binary
interface
Analog
sequence
Symbol
sequence
ELEC5360 9
Lossless
coding
-
7/27/2019 Lecture 2-Source Coding
10/53
DiscreteMemoryless Source(DMS)
Consideradiscreteinformationsourcewithanalphabet
ofM possibleletters,
Notnecessarilynumericalvalues,e.g.,{sunny,cloudy,rainy} Probabilisticmodel
Discretememoryless source(DMS)
Theoutput
sequence
is
statistically
independent
i.e.,thecurrentoutputletterisindependentofallpastand
futureoutputs
Essentially,itisasequenceofiid randomvariables
ELEC5360 10
, 1 where 1
-
7/27/2019 Lecture 2-Source Coding
11/53
InformationMeasureandCodewordLength
Entropy
Ameasure
of
uncertainty
or
ambiguity
in
X
AmeasureofinformationthatisacquiredbyknowledgeofX
Theaveragecodewordlength
where
isthelengthofthejth codeword
ELEC5360 11
log
Symbolbysymbolencoding , , , , ,
-
7/27/2019 Lecture 2-Source Coding
12/53
VariablelengthSourceCoding
Uniquelydecodable
Instantaneous
codes
without
any
decoding
delay Prefixfreecodes nocodewordisaprefixofanyothercodeword
Bothuniquelyandinstantaneouslydecodable
ELEC5360 12
Letter P CodeI CodeII CodeIII CodeIVx1 1/2 00 1 0 0
x2 1/4 01 00 10 01x3 1/8 10 01 110 011
x4 1/8 11 10 111 111
Notuniquely
decodablePrefix
free
code
-
7/27/2019 Lecture 2-Source Coding
13/53
PrefixfreeCodes
Ifauniquelydecodablecodeexistswithacertainset
ofcodewordlengths,thenaprefixfreecodecan
easilybe
constructed
with
the
same
set
of
lengths
Thedecodercandecodeeachcodewordofaprefix
freecodeimmediatelyonthearrivalofthelastbitin
thatcodeword,
i.e.,
instantaneously
decodable
Givenaprobabilitydistributiononthesource
symbols,itiseasytoconstructaprefixfreecodeof
minimumexpected
length
ELEC5360 13
-
7/27/2019 Lecture 2-Source Coding
14/53
Kraftinequalityforprefixfreecodes
Theorem:Everyprefixfreecodeforanalphabet
withcodewordlength
satisfiesthe
following
Conversely,ifthisconditionissatisfied,thenaprefix
freecode
with
length
exists. Note:justbecauseacodehaslengthsthatsatisfythis
condition,it
does
not
follow
that
the
code
is
prefix
free,
or
evenuniquelydecodable
ELEC5360 14
2 1
-
7/27/2019 Lecture 2-Source Coding
15/53
ProofofKraftInequality
Theproofisbasedonbase2expansions
Base2expansion. 2 e.g.,.011represents1/4+1/8
Thebase2expansion. coverstheinterval 2 , 2 2 . Thisintervalhassize2 andincludesallnumberswhosebase2expansionsstartwith.
Anycodewordoflength
isrepresentedbyarationalnumberinthe
interval[0,1)andcoversanintervalofsize2 Theprooffollowsfromthisrepresentation.
ELEC5360 15
-
7/27/2019 Lecture 2-Source Coding
16/53
Minimum forPrefixfreeCodes
Wewanttominimizetheexpectedlengthsubjectto
theKraftinequality
Theexpected
codeword
length
is
related
to
the
output
rateoftheencoder
Entropybounds
for
prefix
free
codes
Let betheminimumexpectedcodewordlengthoveralltheprefixfreecodesforagiveDMS.Then
ifandonlyiseachprobability isanintegerpowerof2.
ELEC5360 16
1 bit/symbol
-
7/27/2019 Lecture 2-Source Coding
17/53
LagrangeMultiplierSolutionfor
Theproblemoffinding canbeformulatedas
Wecangetanonintegersolutionof byusingaLagrangemultiplier
Settingthederivativewithrespecttoeach equalto0 Where log isalowerboundof
isnotaninteger
ELEC5360 17
min
2
log
-
7/27/2019 Lecture 2-Source Coding
18/53
Upperboundof
Choosethecodeword lengthstobe log Then
Fromthelefthandside,theKraftinequalityissatisfied
Fromthe
right
hand
side
So
ELEC5360 18
-
7/27/2019 Lecture 2-Source Coding
19/53
HuffmansAlgorithm
Theexpectedcodewordlengthminimizationproblemisan
integeroptimizationproblem,whichingeneralisquite
difficult Surprisingly,DavidHuffmanproposedaverysimpleand
straightforwardalgorithmforconstructingoptimalprefixfree
codes
Huffmandeveloped
the
algorithm
in
1950
as
aterm
paper
in
Robert
Fanos informationtheoryclassatMIT
isoptimalinthesensethatthecodewords satisfytheprefixcondition
andtheaverageblocklengthisaminimum.
Applications:JPEG,
MP3,
etc
ELEC5360 19
-
7/27/2019 Lecture 2-Source Coding
20/53
HuffmansAlgorithm
Propertiesofoptimalcodes
1. Optimalcodeshavethepropertythatif ,then 2. Optimal
prefix
free
codes
have
the
property
that
the
associated
code
treeisfull
3. Optimalprefixfreecodeshavethepropertythat,foreachofthe
longestcodewords inthecode,thesiblingofthatcodeword is
anotherlongestcodeword
4. LetXbearandomsymbolwithapmf satisfying .ThereisanoptimalprefixfreecodeforXinwhichthecodewords forM1andMaresiblingsandhavemaximallengthwithinthecode
ELEC5360 20
-
7/27/2019 Lecture 2-Source Coding
21/53
-
7/27/2019 Lecture 2-Source Coding
22/53
Example2
ELEC5360 22
-
7/27/2019 Lecture 2-Source Coding
23/53
LempelZivUniversalDataCompression
Huffmancoding
Weneedtoknowthesourcestatistics
Itis
not
optimal
when
the
pmf are
unknown,
not
identicallydistributed,ornotindependent
LempelZivalgorithm
Belongsto
the
class
of
universal
source
coding
algorithms
Doesnotneedtoknowthesourcestatistics
LZ77,usesstringmatchingonaslidingwindow
LZ78,
uses
an
adaptive
dictionary
Applications:UNIXcompress,GIF,TIFF,PDF,etc
ELEC5360 23
-
7/27/2019 Lecture 2-Source Coding
24/53
LZ78 Dictionarybasedcompression
Codeword output
Anindexreferringtothelongestmatchingdictionaryentry
+The
first
non
matching
symbol
Alsoaddthenewstringtothedictionary
Foranewsymbolnotinthedictionary,thecodeword is0+thissymbol
ELEC5360 24
Example:
-
7/27/2019 Lecture 2-Source Coding
25/53
-
7/27/2019 Lecture 2-Source Coding
26/53
FixedlengthtofixedlengthSourceCoding
Onemaindisadvantageoffixedtovariablelength
codesisthatbitsleavetheencoderatavariablerate
Ifthe
incoming
symbols
have
afixed
rate,
the
encoded
bits
mustbequeuedandthereisapositiveprobabilityforthe
queuetooverflow
Analternativepointofviewistoconsiderfixed
lengthtofixedlength codes
ELEC5360 26
-
7/27/2019 Lecture 2-Source Coding
27/53
AsymptoticEquipartition Property(AEP)
AEP:if, , areiid ~,then
The
typical
set
for
any
0 is
defined
as
Alltypicalsequenceshaveroughlythesameprobability
2 Fornsufficientlylarge, , , , 1 Itcanbeprovedthat
Soroughlythenumberoftypicalsequencesis2ELEC
5360 27
log , , , inprobability
,, , : 1 log , , ,
1 2 2 ProveinHWProveinHW
-
7/27/2019 Lecture 2-Source Coding
28/53
AsymptoticEquipartition Property(AEP)
Roughlyspeaking,theAEPsays
Givenaverylongstringofn iid discreterandomsymbols
X1,,Xn,
there
exists
atypical
set
of
sample
strings
(x1,,
xn)whoseaggregateprobabilityisalmost1.
Thereareroughly2 typicalstringsoflengthn,andeachhasaprobabilityroughlyequalto
2
ELEC5360 28
-
7/27/2019 Lecture 2-Source Coding
29/53
SourceCodingTheorem
Wecanprovideacodewordonlyforeachtypicalsequence
Ifthesequenceisnotatypicalsequence,thenasource
coding
failure
is
declared Thisprobabilitycanbemadearbitrarilysmallbychoosingn
largeenough
LosslessSourceCodingTheorem
LetXdenoteaDMSwithentropyH(X).Thereexistsalossless
sourcecodeforthissourceatanyrateR ifR>H(X).There
existsno
lossless
code
for
this
source
at
rates
less
than
H(X).
ELEC5360 29
-
7/27/2019 Lecture 2-Source Coding
30/53
SamplerInput
waveformQuantizer
Discrete
encoder
BinaryChannel
AnalogfilterOutput
waveformTablelookup
Discrete
decoder
Binary
interface
Analog
sequence
Symbol
sequence
ELEC5360 30
Lossy
quantization
-
7/27/2019 Lecture 2-Source Coding
31/53
LossyDataCompression
Forcontinuousamplitudeanalogsequences
Losslesscompressionisimpossible
Lossy
compression
through
scalar
or
vector
quantization Quantization
Usesafinitenumberoflevelstorepresenteachcontinuousamplitudesymbol
Introducesdistortion,
aloss
of
signal
fidelity
ThenwemayuseHuffmancodingtoimprovetheefficiency
Tradeoffbetween
bit
rate
and
distortion!
Minimizebitrateforagivendistortion
Minimizedistortionforagivenrate
ELEC5360 31
-
7/27/2019 Lecture 2-Source Coding
32/53
DistortionMeasure
Definedistortionassomedistancemetricbetweentheactual
signalsamples
andthequantizedvalues
,denoted
by
, Acommonlyuseddistortionmeasureisthesquarederrorfunction Thedistortionbetweenasequenceofn samples
andthe
correspondingn quantized
values
istheaverageoverthensourceoutputsamples Itsexpectedvalueis
ELEC5360 32
,
, 1 ,
,
1
,
,
-
7/27/2019 Lecture 2-Source Coding
33/53
RateDistortionFunction
RatedistortionfunctionR(D)
; isthemutualinformationbetween and R(D) istheminimumratethatisrequiredtorepresenttheoutputXwithadistortionlessthanorequaltoD
RateR(D) decreases
as
D increases
Sourcecodingwithafidelitycriterion
Amemoryless source
Xcan
be
encoded
at
rate
R
for
adistortionnotexceedingD ifR>R(D).Conversely,foranycode
withrateR
-
7/27/2019 Lecture 2-Source Coding
34/53
GaussianSourcewithSquaredErrorDistortion
ForcontinuousamplitudeGaussiansource(with
variance ),theratedistortionfunctionisknown
Noinformationneedbetransmittedwhen
Gaussianis
upper
bound
to
all
other
sources
that
is
Gaussianrequiresmaximumrateamongallothersources
Thedistortionratefunction
Themeansquareerrordistortiondecreasesattherateof
6dB/bit
ELEC5360 34
12 log , 0 0,
2
-
7/27/2019 Lecture 2-Source Coding
35/53
BinarySourcewithHammingDistortion
Considerabinarysourcewith 1 1 0 ,withentropydenotedas
From
lossless
source
coding
theorem,
it
can
be
compressed
at
any
rate
R thatsatisfies ,andcanberecoveredperfectly If ,errorswilloccur
Hammingdistortion
TheaveragedistortionisE , ,i.e.,theaverageofHammingdistortionistheerrorprobabilityin
reconstructionofthesource
Therate
distortion
function
ELEC5360 35
, 1, 0,
, 0 min , 1
0,
-
7/27/2019 Lecture 2-Source Coding
36/53
, ,,withlosslessdiscrete
sourceencoding
Encodedsequence,, ,WithanalphabetofsizeM
Quantization
Scalarquantization:eachanalogRVisquantizedindependently
Vectorquantization:theanalogsequenceissegmentedinto
blocksofnRVseach,theneachntupleisquantizedasaunit
ELEC5360 36
Inputsequence,, ,AnalogRVs,withpdf Quantizer Discreteencoder
Binary
Channel
Outputsequence,, , Tablelookup Discretedecoder
Binary
interface
-
7/27/2019 Lecture 2-Source Coding
37/53
Example1
ELEC5360 37
-
7/27/2019 Lecture 2-Source Coding
38/53
Example2
ELEC5360 38
-
7/27/2019 Lecture 2-Source Coding
39/53
ScalarQuantization
Ascalarquantizer partitionsthesetofrealnumbersintoM
subsets, , ,calledquantizationregions
Each
region
is
represented
by
a
representation
point
Whenthesourceproducesanumber ,thatnumberis
quantizedintothepoint Q:givenM,chooseregionsandrepresentationpointsto
minimizethe
mean
squared
error
ELEC5360 39
-
7/27/2019 Lecture 2-Source Coding
40/53
ScalarQuantizer
Choiceofintervalsforgivenrepresentationpoints
Givenanyu,thesquarederrorto is
It
is
minimized
by
representing
u
by
the
closest
Thustheboundary lieshalfwaybetween and Choiceofrepresentationpointsforgivenintervals
Choose
tominimize
mustbetheconditionalmeanofUconditionalon
ELEC5360 40
-
7/27/2019 Lecture 2-Source Coding
41/53
TheLloydMaxAlgorithm
Remark
TheMSEdecreasesforeachexecutionofstep(2)and(3)
MSEisnonnegative,itapproachessomelimit
However,the
algorithm
might
reach
alocal
minimum
of
MSE
Thealgorithmalsoworksforvectorquantization
ELEC5360 41
-
7/27/2019 Lecture 2-Source Coding
42/53
SamplerInput
waveformQuantizer
Discrete
encoder
Binary
Channel
AnalogfilterOutput
waveformTablelookup
Discrete
decoder
Binary
interface
Analog
sequence
Symbol
sequence
ELEC5360 42
Sampling
-
7/27/2019 Lecture 2-Source Coding
43/53
AnalogDataCompression
Thegeneralapproach
Expandthewaveformintoanorthogonalexpansion
Quantizethe
coefficients
in
that
expansion
Usediscretesourcecodingonthequantizer output
Example:PCM
So
quantization
and
discrete
source
coding
serve
as
outer
layers
Example
Instandardtelephony,thevoiceisfilteredto4kHzandthensampled
at8000
samples
per
second.
Each
sample
is
then
quantized
to
one
of
256possiblelevels,representedby8bits.Thusthevoicesignalis
representedasa64kilobit/second(kb/s)sequence.
ELEC5360 43
-
7/27/2019 Lecture 2-Source Coding
44/53
FiniteEnergySignals
Weassumefiniteenergysignals,i.e.,L2 functions
Finiteenergy
signals
are
appropriate
for
modeling
both
sourcewaveformsandchannelwaveforms
Theyenjoysimplicityandgeneralityofmathematical
properties
E.g.,every
L2 timelimitedwaveformhasaFourierseries,everyL2
waveformhasaFourierintegral
canbetreatedessentiallyasvectors
ELEC5360 44
, ,
-
7/27/2019 Lecture 2-Source Coding
45/53
OrthogonalExpansion
Orthogonalexpansionforafiniteenergysignal where
and
, , formsanorthonormalbasisofL2
and
e.g.,Fourier
series
ELEC5360 45
0, 1,
-
7/27/2019 Lecture 2-Source Coding
46/53
FourierSeries
Consideratimelimitedcomplexfunctionu(t)in[T/2,T/2]
ItsFourierseriesisgivenby
where
Thus,awaveformcanberegardedasavector
It
can
be
used
for
sampling
in
the
frequency
domain
ELEC5360 46
-
7/27/2019 Lecture 2-Source Coding
47/53
SamplingTheorem
Considerafrequencylimitedcomplexfunctionu(t)in[W,W]
Samplingtheorem
where
Thus,awaveformcanberegardedasavector
Itcanbeusedforsamplinginthetimedomain
ELEC5360 47
-
7/27/2019 Lecture 2-Source Coding
48/53
SubspaceApproximation
Wemaytakealimitednumberofsamples
Willthisbeagoodapproximation?
Givenanorthonormalbasis
, ,
Foran
arbitrary
L2 functionf(t),theexpression
hasaminimum,for
andthe
minimum
equals
Moreover
where
ELEC5360 48
-
7/27/2019 Lecture 2-Source Coding
49/53
FiniteEnergySignalsandMSE
Fromorthogonalexpansion
Suppose
these
coefficients
are
quantized
as
,
and
these
quantizedcoefficientsareusedtorecoverthesourcesignal,
denotedas ,thenwehave Thedistortion,measuredastheenergyinthedifference
betweenthesourcewaveformandthereconstructed
waveform,isproportionaltothesquaredquantizationerrorin
thequantized
coefficients
ThisexplainsthewidespreaduseoftheMSE
ELEC5360 49
Segmenting in the time domain
-
7/27/2019 Lecture 2-Source Coding
50/53
Segmentinginthetimedomain
Frequency
domain
sampling SegmentthewaveformintosegmentsofdurationT,andthen
expandeachsegmentinaFourierseries
Considerasegmentcenteredaroundsomearbitrarytime
,
i.e.,expandfrom/2 to/2,theFourierseriesis where
IfthebandwidthislimitedtobeB,thenweroughlyneed2BTreal
numberstorepresentthissegment
Mostvoicecompressionalgorithmsusesuchanapproach,usuallybreakingthevoicewaveforminto20mssegments.
ELEC5360 50
/
, /2 /2 1 // /
Segmenting in the frequency domain
-
7/27/2019 Lecture 2-Source Coding
51/53
Segmentinginthefrequencydomain
Time
domain
sampling SegmentthewaveforminfrequencyofbandwidthB and
samplingeachfrequencyband
Considerasegmentcenteredaroundsomearbitrary
frequency,i.e.,expandfrom to ,withthesamplinginterval 1/2,followingthesamplingtheorem IfthetimeintervalislimitedtobeT,thenweroughlyneed2BTreal
numberstorepresentthissegment
Both
/rect andsinc areorthogonalexpansionsELEC
5360 51
sinc
f d
-
7/27/2019 Lecture 2-Source Coding
52/53
DegreesofFreedom
Frompreviousdiscussion,weseethatthosedifferent
expansionsrequireroughly2BTrealnumbersforthe
approximatespecificationofawaveformessentiallylimitedto
timeTand
frequency
band
B
Thisnumber,2BT,iscalleddegreesoffreedom
Thisisanimportantruleofthumbusedbycommunication
engineers Forsourcecoding,thisrepresentsthenumberofcoefficientsweneed
tospecifyinordertorecoverthesourcewaveform
Forchannelcodingthatwewilldiscusslater,itmeansthiswaveform
can
carry
this
amount
of
symbols Note:ourdiscussionisbasedonsomevagueidea.Amore
precisewayisthroughtheprolate spheroidalwaveforms
expansion
ELEC5360 52
S
-
7/27/2019 Lecture 2-Source Coding
53/53
Summary
Discretesourcecoding
Losslesscompression
Lossy
compression
Quantization
Sampling
Readingassignment
Ch24ofGallager
Section6.1
6.4
of
Proakis
ELEC5360 53