quantization, compression, and classification: extracting ...gray/ucsd06.pdfa/d conversion, data...

56
UCSD 27 January 2006 Quantization, Compression, and Classification: Extracting discrete information from a continuous world Robert M. Gray Information Systems Laboratory, Dept of Electrical Engineering Stanford, CA 94305 [email protected] Partially supported by the National Science Foundation, Norsk Electro-Optikk, and Hewlett Packard Laboratories. A pdf of these slides may be found at http://ee.stanford.edu/~gray/ucsd06.pdf Quantization 1

Upload: others

Post on 16-Jul-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

UCSD

27 January 2006

Quantization,Compression, and Classification:Extracting discrete information

from a continuous world

Robert M. GrayInformation Systems Laboratory, Dept of Electrical Engineering

Stanford, CA [email protected]

Partially supported by the National Science Foundation, Norsk Electro-Optikk,and Hewlett Packard Laboratories.

A pdf of these slides may be found at http://ee.stanford.edu/~gray/ucsd06.pdf

Quantization 1

Page 2: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Introduction

continuous world

discrete representations��

����

@@

@@@R

encoder α

011100100 · · ·

13HH

HHHHj

decoder β

“Picture of a privateer”

������*decoder β

How well can you do it?

What do you mean by “well”?

How do you do it?

Quantization 2

Page 3: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

The dictionary (Random House) definition of quantization:

the division of a quantity into a discrete number of small

parts, often assumed to be integral multiples of a common

quantity.

In general, The mapping of a continuous quantity into a

discrete quantity.

Converting a continuous quantity into a discrete quantity

causes loss of accuracy and information. Usually need to

minimize the loss.

Quantization 3

Page 4: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Quantization

Old example: Round off real numbers to nearest integer:

Estimate densities by histograms [Sheppard (1898)]

More generally, quantizer of a space A (e.g., <k, L2([0, 1]2)consists of

• An encoder α : A → I, I an index set, e.g., nonnegative

integers. ⇔ partition S = {Si; i ∈ I}: Si = {x : α(x) = i}.

• A decoder β : I → C ⇔ codebook C = β(α(A))).

Scalar quantizer:

- x︸ ︷︷ ︸S0

︸ ︷︷ ︸S1

︸ ︷︷ ︸S2

︸︷︷︸S3

︸ ︷︷ ︸S4

β(0)∗

β(1)∗

β(2)∗

β(3)∗

β(4)∗

Quantization 4

Page 5: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Two dimensional example:

Centroidal Voronoi diagram:

Nearest neighbor partition

Euclidean centroids

E.g., complex numbers, nearest

mail boxes, sensors, repeaters, fish

fortresses . . .

territories of the male Tilapia

mossambica [G. W. Barlow,

Hexagonal Territories, Animal

Behavior, Volume 22, 1974]

Quantization 5

Page 6: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Three dimensions: Voronoi

partition around spheres

http://www-

math.mit.edu/dryfluids/gallery/

Quantization 6

Page 7: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Quality/Distortion/Cost

Quality of a quantizer measured by the goodness of the

resulting reproduction in comparison to the original.

Assume a distortion measure d(x, y) ≥ 0 which measures

penalty if an input x results in an output y = β(α(x)).

Small (large) distortion ⇔ good (bad) quality

System quality measured by average distortion

Assume X a random object (variable, vector, process, field) with

probability distribution PX: pixel intensities, samples, features,

fields, DCT or wavelet transform coefficients, moments, etc.

If A = <k, PX ⇔ pdf f , pmf p, or empirical distribution PLfrom training/learning set L = {xl; l = 1, 2, . . . , |L|}

Quantization 7

Page 8: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Average Distortion wrt PX: D(α, β) = E[d(X,β(α(X))]

Examples:

• Mean squared error (MSE) d(x, y) = ||x−y||2 =∑k

i=1 |xi−yi|2Useful if y is intended to reproduce or approximate x.

More generally: input or output weighted quadratic, Bx positive

definite matrix: (x− y)tBx(x− y), (x− y)tBy(x− y)

• X is observation, but unseen Y is desired information

“Sensor” described by PX|Y , e.g., X = Y +W .

Quantize x, reconstruct y = κ(i). Bayes risk C(y, y).

d(x, i) = E[C(Y, κ(i))|X = x]

E[d(X,κ(α(X))] = E[C(Y, κ(α(X)))]

Quantization 8

Page 9: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

• Y discrete ⇒ classification, detection

• Y continuous ⇒ estimation, regression

Y -

SENSOR

PX|Y -X -

ENCODER

α - α(X) = i

DECODER

-κ -

β -

Y = κ(i)X = β(i)

Or weighted combination of MSE for X, Bayes risk for Y

Problem: Need to know or estimate PY |X, harder to estimate

than PX.

Quantization 9

Page 10: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Want average distortion small, which can do with big

codebooks and small cells, but . . .

. . . there is usually a cost r(i) of choosing index i in terms

of storage or transmission capacity or computational capacity —

more cells imply more memory, more time, more bits.

For example,

Quantization 10

Page 11: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

• r(i) = ln |C|, All codewords have equal cost. Fixed-rate, classic

quantization.

• r(i) = `(i) — a length function satisfying∑

i e−`(i) ≤ 1

Kraft inequality ⇔ uniquely decodable lossless code exists with

lengths `(i). # of bits or nats required to communicate/store

index.

• r(i) = `(i) = − ln Pr(α(X) = i), Shannon codelengths,

E(`(α(X)) = H(α(X)) = −∑

i pi ln pi, Shannon entropy.

Optimal choice of ` satisfying Kraft. Optimal lossless

compression of `(α(X)). Minimizes required channel capacity

for communication.

• r(i) = (1 − η)`(i) + η ln |C|, η ∈ [0, 1] Combined transmission

length and memory constraints. (current research)

Quantization 11

Page 12: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Issues

If know distributions,

What is optimal tradeoff of distortion vs. rate?

(Want both to be small!)

How design good codes?

If do not know distributions, how use training/learning data

estimate tradeoffs and design good codes? (statistical learning,

machine learning)

Quantization 12

Page 13: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Applications

Communications & signal processing:

A/D conversion, data compression.

Compression required for efficient

transmission– send more data in available

bandwidth

– send the same data in less bandwidth

– more users on same bandwidth

and storage– can store more data– can compress for local storage, put

details on cheaper media

Graphic courtesy of Jim Storer.

Quantization 13

Page 14: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

• Statistical clustering: grouping bird songs, designing Mao

suites, grouping gene features, taxonomy.

• Placing points in space: mailboxes, wireless repeaters, wireless

sensors

• How best approximate continuous probability distributions by

discrete ones? Quantize space of distributions.

• Numerical integration and optimal quadrature rules. Want to

estimate integral∫g(x) dx by sum

∑i∈I V (Si)g(yi), where

V (Si) is the volume of cell Si in the partition S. How best

choose the points yi and cells Si?

• Pick best Gauss mixture model from a finite set based on

training data. Plug into Bayes estimator.

Quantization 14

Page 15: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Distortion-rate optimization: q = (α, β, d, r)

D(q) = E[d(X,β(α(X)))] vs. R(q) = E[r(α(X))]

Minimize D(q) for R(q) ≤ R δ(R) = infq:R(q)≤R

D(q)

Minimize R(q) for D(q) ≤ D r(D) = infq:D(q)≤D

R(q)

Lagrangian approach: ρλ(x, i) = d(x, β(i)) + λr(i), λ ≥ 0

Minimize ρ(f, λ, η, q) ∆= Eρλ(X,α(X))

= D(q) + λR(q)

= Ed(X,β(α(X))) + λ [(1− η)E`(α(X)) + η ln |C|]

ρ(f, λ, η) = infqρ(f, λ, η, q)

Traditional fixed-rate η = 1, variable-rate η = 0

Quantization 15

Page 16: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Three Theories of Quantization

Rate-Distortion Theory Shannon distortion-rate function

D(R)≤ δ(R), achievable in asymptopia of large dimension k

and fixed rate R. [Shannon (1949, 1959), Gallager (1978)]

? Nonasymptotic (Exact) results Necessary conditions for

optimal codes ⇒ iterative design algorithms

⇔ statistical clustering [Steinhaus (1956), Lloyd (1957)]

? High rate theory Optimal performance in asymptopia of fixed

dimension k and large rate R . [Bennett (1948), Lloyd

(1957), Zador (1963), Gersho (1979), Bucklew, Wise (1982)]

Quantization 16

Page 17: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Lloyd Optimality Properties: q = (α, β, d, r)

Any component can be optimized for the others.

Encoder α(x) = argmini

(d(x, β(i)) + λr(i)) Minimum distortion

Decoder β(i) = argminy

E(d(X, y)|α(X) = i) Lloyd centroid

Length Function `(i) = − lnPf(α(X) = i) Shannon codelength

Pruning remove any index i if the pruned code α′, β′, r′ satisfies

D(α′, β′) + λR(α′, r′) ≤ D(α, β) + λR(α, `). Pruning

⇒ Lloyd clustering algorithm, (rediscovered as k-means,

grouped coordinate descent, alternating optimization, principal

points)

For estimation application, often nearly optimal to quantize

optimal nonlinear estimate (Ephraim)

Quantization 17

Page 18: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Classic case of fixed length, squared error =⇒Voronoi partition (Dirichlet tesselation, Theissen polygons,

Wigner-Seitz zones, medial axis transforms), centroidal

codebook. Centroidal Voronoi partition

Voronoi diagrams common in many fields including biology,

cartography, crystallography, chemistry, physics, astronomy,

meteorology, geography, anthropology, computational geometry

. . .

Quantization 18

Page 19: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Scott Snibbe’s

Boundary Functions

http://www.snibbe.com/scott/bf/

The Voronoi game: Competitive facility

location

http://www.voronoigame.com

Quantization 19

Page 20: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

High Rate (Resolution) Theory

Traditional form (Zador, Gersho, Bucklew, Wise): pdf f ,

MSE

limR→∞

e2kRδ(R) =

ak||f || kk+2

fixed-rate (R = lnN)

bke2kh(f) variable-rate (R = H), where

h(f) = −∫

dxf(x) ln f(x), ||f ||k/(k+2) =(∫

x

f(x)k

k+2 dx

)k+2k

ak ≥ bk are Zador constants (depend on k and d, not f !)

a1 = b1 =112, a2 =

518√

3, b2 = ?, ak, bk = ? for k ≥ 3

limk→∞

ak = limk→∞

bk = 1/2πe

Quantization 20

Page 21: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Lagrangian form: (Gray, Linder, Li, Gill) variable-rate

limλ→0

[infq

(ρ(f, λ, 0, q)

λ

)+k

2lnλ

]= θk + h(f)

θk∆= inf

λ>0

[infq

(ρ(u, λ, 0, q)

λ

)+k

2lnλ

]=

k

2ln

2ebkk

limλ→0

[infq

(ρ(f, λ, 1, q)

λ

)+k

2lnλ

]= ψk + ln ||f ||k/2

k/(k+2)

ψk∆= inf

λ>0

[infq

(ρ(u, λ, 1, q)

λ

)+k

2lnλ

]=

k

2ln

2eak

k

fixed-rate both have form optimum for u + function of f

Quantization 21

Page 22: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Ongoing: Conjecture similar results for combined constraints

limλ→0

(infq

(Ef [d(X,β(α(X)))]

λ+ (1− η)Hf(q) + η lnNq

)︸ ︷︷ ︸

ρ(f,λ,η)

+k

2lnλ)

= θk(η) + h(f, η), θk(η)∆= inf

λ>0

(ρ(u, λ, η)

λ+k

2lnλ

)where h(f, η) given by a convex minimization

h(f, η) = h(f) + infν

[k

2

∫dxf(x)

[eν(x) − ν(x)− 1

]+ ηH(f ||Λ)

]where

Λ(x) =e−kν(x)/2∫e−kν(y)/2 dy

, H(f ||λ) =∫

f(x) lnf(x)Λ(x)

dx (1)

Quantization 22

Page 23: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Proofs

Rigorous proofs of traditional cases painfully tedious, but

follow Zador’s original approach:

Step 1 Prove the result for a uniform density on a cube.

Step 2 Prove the result for a pdf that is piecewise constant on

disjoint cubes of equal volume.

Step 3 Prove the result for a general pdf on a cube by

approximating it by a piecewise constant pdf on small cubes.

Step 4 Extend the result for a general pdf on the cube to general

pdfs on <k by limiting arguments.

Quantization 23

Page 24: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Heuristic proofs developed by Gersho: Assume existence of

quantizer point density function Λ(x) for sequence of codebooks

of size N

limN→∞

1N× ( # reproduction vectors in a set S)

=∫

S

Λ(x) dx; all S,

where∫<k Λ(x) dx = 1.

Assume fX(x) smooth, rate large, and minimum distortion

quantizer has cells Si ≈ scaled, rotated, and translated copies of

S∗ achieving

ck = mintesselating convex polytopes S

M(S).

Quantization 24

Page 25: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

M(S) =1

kV (S)2/k

∫S

||x− centroid(S)||2

V (S)dx

Then hand-waving approximations of integrals relate

N(q), Df(q),Hf(q) ⇒

Df(q) ≈ ckEf

((

1N(q)Λ(X)

)2/k

)Hf(q(X)) ≈ h(X)− E

(log(

1N(q)Λ(X)

))

= lnN(q)−H(f ||Λ),

Optimizing using Holder’s inequality or Jensen’s inequality

yields classic fixed-rate and variable-rate results.

Quantization 25

Page 26: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Can also apply to combined-constraint case to get

θ(f, λ, η,Λ) =k

2ln

(keck

2

)+ (1− η)h(f)+

k

2ln

(Ef

((Λ(X))−2/k

))+ (1− η)Ef (lnΛ(X)) , (2)

Goal is to minimize over Λ.

If substitute θk(η) for ck term, equivalent to conjecture

proposed here with Λ(x) defined by (1)!

Suggests connections with rigorous and heuristic proofs, may

help insight and simplify proofs.

Current state: Rigorous proof for u and upper bound for other

steps. Lacking converses.

Quantization 26

Page 27: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Variable-rate: Asymptotically OptimalQuantizers

Theorem ⇒ for λn ↓ 0 there is a λn-asymptotically optimalsequence of quantizers qn = (αn, βn, `n):

limn→∞

(Ef [d(X,βn(αn(X)))]

λn+ Ef`n(αn(X)) +

k

2lnλn

)= θk +h(f)

⇒ λ controls distortion and entropy separately [DCC 05]:

limn→∞

2Df(qn)kλn

= 1 , limn→∞

(H(qn) +

k

2lnλn

)= h(f) + θk .

(Similar results for fixed-rate and combined constraints)

Quantization 27

Page 28: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Variable-rate codes: Mismatch

What if we design a.o. sequence of quantizers qn for pdf g,

but apply it to f? ⇒ mismatch

Answer in terms of relative entropy:

Theorem. [Gray, Linder (2003)] If qn is λn-a.o. for g, then

limn→∞

Df(qn)λn

+Ef`n(αn(X)) +k

2lnλn = h(f) + θk +H(f ||g).

Quantization 28

Page 29: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Worst case: (variable-rate) Asymptotic performance depends on

f only through h(f) ⇒ worst case source given constraints is

max h source, e.g., given m and K, worst case is Gaussian

If use worst case Gaussian for g to design codes and apply

to f , also robust in the sense of Sakrison: Gaussian codes yield

same performance on other sources with same covariance. Loss

from optimum is H(f‖g).

=⇒ suggests using relative entropy (mismatch) as a

“distance” or “distortion measure” on pdfs in order to “quantize”

the space of pdf’s in order to fit models to observed data.

Quantization 29

Page 30: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Classified Codes and Mixture Models

Problem with single worst case: too conservative.

Alternative idea, fit a different Gauss source to distinct groups

of inputs instead of the entire collection of inputs. Robust codes

for local behavior.

Suppose have a partition S = {Sm; m ∈ I} of <k.

If input falls in Sm, use codes designed for a Gaussian model

gm = N (µm,Km) ∈ M, the space of all nonsingular Gaussian

pdfs on <k.

Design asymptotically optimum codes qm,n; n = 1, 2, . . . for

each m and common λn → 0

Quantization 30

Page 31: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Construct overall composite quantizer sequence qn by using

qm,n if in Sm . Defining

fm(x) =

{f(x)/pm x ∈ Sm

0 otherwise; pm =

∫Sm

dxf(x) ⇒

limn→∞

Df(qn)λn

+Ef`n(αn(X))+k

2lnλn = θk+h(f)+

∑m

pmH(fm||gm)︸ ︷︷ ︸mismatch distortion

What is best (smallest) can make this over all partitions

S = {Sm; m ∈ I} and collections G = {gm, pm; m ∈ I}?

⇔ Quantize space of Gauss models using mismatch distortion

Outcome of optimization is a Gauss mixture.

Alternative to Baum-Welch/EM algorithm.

Quantization 31

Page 32: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Can use Lloyd algorithm to optimize.

Lloyd conditions for mismatch distortion imply that

Decoder (centroid) gm = argming∈M

H(fm||g) = N (µm,Km).

Encoder (partition) α(x) = argminm d(x,m) ∆=

argminm

(− ln pm +

12

ln (|Km|) +12(x− µm)tK−1

m (x− µm))

Length Function L(m) = − ln pm.

Can adjust Lagrange weighting of ln pm

Gauss mixture vector quantization (GMVQ)

Quantization 32

Page 33: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Toy example: GMVQ design of GM

Source: 2 component GM:

m1 = (0, 0)t, K1 =[

1 00 1

], p1 = 0.8

m2 = (−2, 2)t, K2 =[

1 11 1.5

], p2 = 0.2

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

Quantization 33

Page 34: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Data and Initialization:

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

Quantization 34

Page 35: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

1Quantization 35

Page 36: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

2Quantization 36

Page 37: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

3Quantization 37

Page 38: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

4Quantization 38

Page 39: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

5Quantization 39

Page 40: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

6Quantization 40

Page 41: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

7Quantization 41

Page 42: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

8Quantization 42

Page 43: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

9Quantization 43

Page 44: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

10Quantization 44

Page 45: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

11Quantization 45

Page 46: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

12Quantization 46

Page 47: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

−6 −5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

6

Converged!Quantization 47

Page 48: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Algorithm for fitting a Gauss mixture to a dataset provides

a supervised learning method for classifier design: Given a

collection of classes of interest, design for each a GMVQ

Given a new vector (e.g., image):

Traditional approach: plug in density estimate to optimal Bayes

Best codebook approach: Code the image using each of the

Gauss mixture VQs and select the class corresponding to the

quantizer with the smallest average mismatch distortion.

Codebook-matching, classification by compression.

“nearest-neighbor,” need not (explicitly) estimate PY |X

Quantization 48

Page 49: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Examples

Aerial images: Manmade vs. natural White: man-made,

gray:natural

original 8 bpp, gray-scale images, hand-labeled classified

images. GMVQ with probability of error 12.23%

Quantization 49

Page 50: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Content-Adresssible Databases/Image Retrieval

8× 8 blocks, 9 images train for each of 50 classes, 500 image

database (cross-validated)

Precision (fraction of retrieved images that are relevant) ≈recall (fraction of relevant images that are retrieved) ≈ .94 .

Query ExamplesQuery

Quantization 50

Page 51: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

North Sea Gas Pipeline Data

normal pipeline image field joint longitudinal weld

which is which???

Quantization 51

Page 52: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

• GMVQ Best codebook wins

• MAP-GMVQ Plug into ECVQ produced GM for MAP

• 1-NN

• MART (boosted classification tree)

• Regularized QDA (Gaussian class models)

• MAP-ECVQ (GM class models)

• MAP-EM: Plug into EM produced GM for MAP

Quantization 52

Page 53: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Method Recall Precision AccuracyS V W S V W

MART 0.9608 0.9000 0.8718 0.9545 0.9000 0.8947 0.9387Reg. QDA 0.9869 1.0000 0.9487 0.9869 0.9091 1.0000 0.98111-NN 0.9281 0.7000 0.8462 0.9221 0.8750 1.0000 0.8915MAP-ECVQ 0.9737 0.9000 0.9437 0.9739 0.9000 0.9487 0.9623MAP-EM 0.9739 0.9000 0.9487 0.9739 0.9000 0.9487 0.9623MAP-GMVQ 0.9935 0.8500 0.9487 0.9682 1.0000 0.9737 0.9717GMVQ 0.9673 0.8000 0.9487 0.9737 0.7619 0.9487 0.9481

Class Subclasses

S Normal, Osmosis Blisters, Black Lines, Small Black Corrosion Dots, Grinder Marks, MFL Marks,Corrosion Blisters, Single Dots

V Longitudinal Welds

W Weld Cavity, Field Joint

Recall = Pr (declare as class i | class i )Precision =Pr ( class i | declare as class i )

Implementation is simple, convergence is fast,

classification performance is good.

Quantization 53

Page 54: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

Parting Thoughts

• Quantization ideas useful for A/D, compression, classification,

detection, and modeling. Alternative to EM for Gauss mixture

design.

• Connections with Gersho’s conjucture on asymptotic optimal

cell shapes (and its implications)

Gersho’s approach provides intuitive, but nonrigorous, proofs

of high rate results based on conjectured geometry of optimal

cell shapes.

Has many interesting implications that have never been proved:

– Existence of quantizer point density functions at high rate

(known for fixed-rate [Bucklew])

Quantization 54

Page 55: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

– Optimal high-rate variable-rate quantizer point density is

uniform.

– At high rate optimal fixed-rate quantizers on u are maximum

entropy.

– Equality of fixed-rate and entropy constrained optimal

performance (ak = bk)

• Combined constraint approach provides

– unified development of traditional cases and provides a

formulation consistent with Gersho’s approach, allows

rigorous proofs of several of Gersho’s steps and parts of

his conjecture.

– describes separate behavior of distortion, entropy, and

codebook size of asymptotically optimum quantizers

Quantization 55

Page 56: Quantization, Compression, and Classification: Extracting ...gray/ucsd06.pdfA/D conversion, data compression. Compression required for efficient transmission – send more data in

– provides equivalent conditions for implications of Gersho’s

conjectures and a possible means of eventually

demonstrating or refuting some of these implications

E.g., if Gersho true, then θk(η) is constant with zero

derivative, ak = bk, optimal fixed rate codes have maximum

entropy.

– provides a new Lloyd optimality condition: pruning

– high rate results complement Shannon results

mixed constraint theory does not have a Shannon

counterpart — in high dimensions D(R) describes both

asymptotic codebook size and quantizer entropy (asymptotic

equipartition property)

– adding components to an optimization problem may help

find better local optima.

Quantization 56