introduction to compression - meetupfiles.meetup.com/20727833/compression introacm.pdf ·...

39
Introduction to Compression Norm Zeck

Upload: others

Post on 24-Mar-2020

21 views

Category:

Documents


0 download

TRANSCRIPT

Introduction to Compression

Norm Zeck

Norm Zeck Vita

11/10/2016Norm Zeck

2 • BSEE University of Buffalo (Microcoded Computer Architecture)• MSEE University of Rochester (Thesis: CMOS VLSI Design)

• Retired from Palo Alto Research Center (PARC), a Xerox company (Rochester NY site). 37 years.

• Research projects: Selenium distillation (digital process controls), Mechatronics, VLSI for contour font rasterization, Halftone hardware/rendering, Document imaging systems, Multi-mode compression, Page Parallel Raster image processing (aka Hadoop, 10+ years earlier), Data Analytics

• 18 Patents

• Research Management: • Competency. Area: 15 researchers, Lab: 85 researchers• Special Projects/Operations ~225 researchers• Program: 5-10 projects across centers. Transportation,

Healthcare, Call centers, Government payment, Child Support

Goals

Intro to lossless and lossy compression

Focus on models, one example of encoding

Little or no Math

Use of examples to highlight main modeling concepts

Launch point for individual study on topics of interest

11/10/2016Norm Zeck

3

Why Compress?

11/10/2016Norm Zeck

4

MessageSource

MessageDestination

(application)

Channel

Cable TV Video (MPEG)

Cable Distribution System, DVR Home TV

More channels to customer for same infrastructure investment

Value Proposition

Camera Sensor (JPEG, MPEG)

Solid State Storage, Transmission (sharing)

View/Print/SharePhoto/video

More pictures/video per storage unit, faster transmission on communication channel, miniaturization of phones/cameras.

Digital Audio(MP3)

Solid State Storage Listen to Audio More audio per storage, enabled

miniaturization of player

Document (CCITT G4/G3)

Telephone/Fax Sent documentAbility to use low bandwidth communications channels for document transmission. Universal availability to share documents.

File storage (ZIP, PDF, Word….

Disk/net File Exchange,Storage

Storage/transmission efficiency

Storage

General Compression Design Elements

11/10/2016Norm Zeck

5

Model Encoder

• Application Examples• Text documents, wave files (mp3), images (jpeg)

• Model understands the information needed in the message to support the application and enable compression:

• Text has repetitive words/sets of words that are reused. Do not need to store each word, just a reference.

• The probability of occurrence of symbols in the message are known

• Human visual system model (jpeg, mpeg)• Human auditory system model (mp3)

• Encoder is designed to take advantage of the model and application. Often includes a formatter as well (central to the application)

Application

DecoderModel

Application

Compression Decompression

How to measure compression?

Lossless compression has typically four measures: Size: How much the message is compressed. Complexity: affects implementation - software/hardware Speed – time to compress or decompress, closely tied to

complexity Resources: Memory, ability to multi-thread, silicon real estate for

hardware

Lossy compression adds quality How much and what information to “loose” in the message

Some applications may want compressor and decompressor to be different in complexity and performance DVD – Compression: May want to spend a lot of

time/complexity/analysis in compressor getting best quality/bit rate for the DVD master.

DVD – Decompressor: often implemented in commodity hardware or “real-time” software, is desired to be simpler, more constrained.

11/10/2016Norm Zeck

6

How to measure compression size?

Compression size measures Technical community (bigger is better)

Ratio of original/compressed:1

100,000 bytes compressed to 50,000 bytes would be: 100,000/50,000 = 2:1

Some apps use percent of original, in this case 50%

Streaming applications: MPEG (video), MP3 (audio) often use “bit rate”Application: Fixed/limited channel bandwidth:

cable, telephone, cell, disk

MP3: 128-256 K bits/second; MPEG-1 1.5Mbits/second; Mpeg-2 ~10Mbits/second

11/10/2016Norm Zeck

7

Two Lossless Compression Models

11/10/2016Norm Zeck

8

LZ77 – sliding window dictionary

11/10/2016Norm Zeck

9

Model: “I’ve seen this before”• Many data sets have repeated information• Word “_the_” in text• LZ77 – Abraham Lempel, Jacob Ziv

Past Data(search buffer)

Current Pointer

Future Data(look ahead)

LZ77 – sliding window dictionary

11/10/2016Norm Zeck

10 As the input is read, a sliding window of data is kept. As more is read, that new input is added to the window and oldest data is discarded. Compression happens by finding repeated data at the input that is also in the window of data. The compressor emits a pointer to the data in the window and a copy length and next symbol. The longer the string of symbols that are found in the window, the longer the copy length and the better the input is compressed. Decompresssor just needs to maintain the window, index back into the window and copy out length symbols.

7-ZIP Compression Methods (from Wikipedia)

DEFLATE – Standard algorithm based on 32 kB LZ77 (LZSS Lempel–Ziv–Storer–Szymanski actually) and Huffman coding. Deflate is found in several file formats including ZIP, gzip, PNG and PDF. 7-Zip contains a from-scratch DEFLATE encoder that frequently beats the de facto standard zlib version in compression size, but at the expense of CPU usage.

DEFLATE64 – LZ77 with 64kB window with Huffman coding. LZMA – A variation of the LZ77 algorithm, using a sliding dictionary up to 4 GB in length

for duplicate string elimination. The LZ stage is followed by entropy coding using a Markov chain-based range coder and binary trees. Window – 64k to 1GB.

LZMA2 – modified version of LZMA providing better multithreading support and less expansion of incompressible data. (LZ77)

Bzip2 – The standard Burrows–Wheeler transform (BWT) algorithm. Bzip2 uses two reversible transformations; BWT, then Move to front with Huffman coding for symbol reduction (the actual compression element). BWT orders text in such a way to increase the sequences of repeat characters

PPMd (Prediction by Partial Matching) – PPM models use a set of previous symbols in the uncompressed symbol stream to predict the next symbol in the stream. Can use Huffman, dictionary (LZ77) coding after the prediction.

11/10/2016Norm Zeck

11

Observations on images Consist of a series of scan lines – horizontal rows of pixels

Model: Scan lines tend to have runs of pixels as well as small differences between scan lines.

11/10/2016Norm Zeck

12

Scan Lines are horizontal sequence of pixels

Example runs of black pixels

Notice small differences between scan lines

PNG File Compression

11/10/2016Norm Zeck

13

SourceFile

2-line PredictorA,B,C,D can be used to predict X

Emits X (Raw)Or one of {Table Below} – XSelected based on which one best predicts X

DEFLATE

Filer types chosen adaptively scan line by scan line

PNGFile

The source file is analyzed to see which of the filters in the table works best for the current scan line. The byte type selects which filter to use. The predictor emits X or the difference between X and the predicted byte (pixel) based on the analysis. If the predictor is accurate, many common values will result (typically 0). This will result in better compression via the DEFLATE (LZ77) compressor. Decompressor implements the inverse.

Lossy Image CompressionNorm Zeck

Standards

JPEG – Joint Photographic Experts Group Pink book is the standard

MPEG – Motion Picture Experts Group MPEG 1, 2 (DVD), MPEG layer 3 (MP3) (DVD audio)

Many other standard, semi-standard and proprietary formats

11/10/2016Norm Zeck

15

Joint Photographic Experts Group (JPEG)Lossy compression modeled after the human visual system (HVS)

11/10/2016Norm Zeck

16

~120 Million Rods, ~5-6 Million Cones in the human eye

Human Eye Spatial Frequency Response

11/10/2016Norm Zeck

17

Some spatial frequencies are less visible to the eyeColor is less sensitive to higher frequencies JPEG takes advantage of this in choosing what information to loose

Color and Luminance are coded separately

11/10/2016Norm Zeck

18

RGB color space has redundant informationYCrCb: color is mapped to two components that can be further reduced in information content at a lower reduction in perceptual appearance; luminance is separate and can be treated differently; Low Complexity: Color space transform is a simple linear conversionY = luminance (think monochrome), CrCb are the two color channels

Subsampling example

11/10/2016Norm Zeck

19

JPEG Block Diagram

11/10/2016Norm Zeck

20

• Key is the use of the Discrete Cosine Transform (DCT) to map the spatial image to a frequency domain.

• Frequencies that are not as visible are removed (quantized), then the remainder is lossless coded via Huffman coding.

• Color transform is included, then the color channels are down sampled to take advantage of the lower spatial frequency response

Divide image into 8x8 pixel blocks

11/10/2016Norm Zeck

21

Discrete Cosine Transform (DCT)Basis functions and sample reconstruction

11/10/2016Norm Zeck

22

DCT

Discrete Cosine Transform (DCT)Basis functions and sample reconstruction

11/10/2016Norm Zeck

23

*DC term, rest are called AC terms

JPEG Quantization Tables

11/10/2016Norm Zeck

24

Based on psychovisual threshold experimentsLuminance is not subsampled, lighter quantizationChrominance, subsampled 2:1, heavier quantization

Luminance Quantization Table Chrominance Quantization Table

Larger numbers more heavily quantize the DCT coefficient

JPEG Block Diagram

11/10/2016Norm Zeck

25

• Quantization adds redundancy to the coefficients.• Encoder uses Huffman coding to efficiently represent the quantized

coefficients.

Prefix Codes

11/10/2016Norm Zeck

26Model Encoder

More redundantBut not necessarily Encoded (represented)Efficiently

More efficientrepresentation(same information)

Fixed length codes (ascii) (8 bits). Easy to use, not efficient (a is the same number of bits as z)

Variable length codes. More frequent symbols, fewer bits

0101101110

Prefix codes – unique variable length codes that can easily be parsed

Parse fromLeft to Right

Huffman Encoding History In 1951, David A. Huffman and his MIT information theory

classmates were given the choice of a term paper or a final exam. The professor, Robert M. Fano, assigned a term paper on the problem of finding the most efficient binary code.

Huffman, unable to prove any codes were the most efficient, was about to give up and start studying for the final when he hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the most efficient.[3]

In doing so, Huffman outdid Fano, who had worked with information theory inventor Claude Shannon to develop a similar code.

By building the tree from the bottom up instead of the top down, Huffman avoided the major flaw of the suboptimal Shannon-Fano coding.

(Wikipedia)

3/2/2016Norm Zeck

27

Example Message

11/10/2016Norm Zeck

28

44444444 8888888888888 9999999999 2222 333333333333 2222 555555 44444444 5555552222 8888888888888 2222 555555 2222 555555333333333333 555555 9999999999

Message contains 132 Symbols = 528 bits (132*4 bits/symbol)

Sequence Probability Code2222 0.4 0

555555 0.3 103333333333333 0.1 110

44444444 0.1 11108888888888888 0.05 11110

9999999999 0.05 11111

Example MessageHuffman Coding

Norm Zeck

29• Know the probabilities of a symbol occurring in the message• Organize as a binary tree• Apply a prefix code to uniquely identify sequences optimized by the

ordering of the binary tree

2222 0.40

1.0

0.6

1

555555 0.310

0.3

11

333333333333 0.1110

0.2

0.1

11111110

11111

44444444 0.1

8888888888888 0.05

9999999999 0.05

1110

1111

Prefix code

Example MessageHuffman encoded

3/2/2016Norm Zeck

30

44444444 8888888888888 9999999999 2222 333333333333 2222 555555 44444444 5555552222 8888888888888 2222 555555 2222 555555333333333333 555555 9999999999

Message contains 132 Symbols = 528 bits (132*4 bits/symbol)

[1110] [11110] [11111] [0][110] [0] [10] [1110] [10][0] [11110] [0] [0] [10] [0] [10][110] [10] [11111]

15121310

Number of bits

Huffman encoding contains 50 bitsCompression (528/50):1or 10.5:1

Problem with high frequency term quantization

11/10/2016Norm Zeck

31

Loss of high frequency terms results in “ringing”. The inability to reconstruct the edges. For the math, see Gibbs phenomenon.

Lossy compression: Quality? How to compare different algorithms or determine if the loss is ok

for your application. Lossless – just compare compression ratios.

Lossy: Want to loose information that is not important to the application.

Use a model of the perceptual part of the human visual system (HVS): Tried that – we do not understand the HVS enough to make a good model

Human psycho-visual experiments Select images, process, print or view original vs processed

Rank with as many observers as you can get

Expensive – labor and controlled lab setup

Population bias – Researchers are very critical, others not enough

Image type bias

11/10/2016Norm Zeck

32

Test image example

11/10/2016Norm Zeck

33• Processing and printing/viewing

and ranking images have a per image cost

• Select the image, process, print or view setup, schedule observers, process observations, monitor and check each step

• Often images contain a lots of content to stress and cover a wide range of possible errors in a single image

• Need to use customer/market images with customers

Test image exampleCompressed 48:1

11/10/2016Norm Zeck

34

Original Compressed

Comparison ~48:1

11/10/2016Norm Zeck

35

Original

Compressed

11/10/2016Norm Zeck

36

LZMA SDK (you too can add compression to your application)

LZMA SDK is available to use The LZMA SDK provides the documentation, samples, header files, libraries,

and tools you need to develop applications that use LZMA compression. LZMA SDK includes:

C++ source code of LZMA Encoder and Decoder

C++ source code for .7z compression and decompression (reduced version)

ANSI-C compatible source code for LZMA / LZMA2 / XZ compression and decompression

ANSI-C compatible source code for 7z decompression with example

C# source code for LZMA compression and decompression

Java source code for LZMA compression and decompression

lzma.exe for .lzma compression and decompression

7zr.exe to work with 7z archives (reduced version of 7z.exe from 7-Zip)

SFX modules to create self-extracting packages and installers

http://www.7-zip.org/sdk.html

3/2/2016Norm Zeck

37

Moving Pictures Experts Group Layer III – aka MP3

11/10/2016Norm Zeck

38

Filter BankAudio Signal

32Sub-bands

MDCT512samples

FFTPsychoacoustic

Model

MDCT – Multiple DCTs on each subband sampleFFT – Fast Fourier Transform – map to frequency domain for analysisRaw CD audio is ~ 10 MB/minuteMP3 compression is typically ~10-11:1 reducing to ~1MB/minute

Non-uniform Quantizer

Huffman Encoding

Side Band Encoding

StreamFormatting

Analysis and Modeling for Quantizer

MPEG Frame Encoding

11/10/2016Norm Zeck

39

I = DCT encoded reference frame – no other frames are usedP = Use only previous frames for prediction B = Use both forward and previous frames for prediction