data compression 資料壓縮 - 南華大學chun/r-cs-chap15-data compression.pdf · 2018-12-05 ·...

1

15.1 Source: Foundations of Computer Science Cengage Learning

Chapter 15

Data Compression

資料壓縮

15.2

Distinguish between lossless and lossy compression.能區分無損失與有損失壓縮

Describe run-length encoding and how it achieves compression.

可變長度編碼之壓縮法

Describe Huffman coding and how it achieves compression.霍夫曼編碼之壓縮法

Describe Lempel Ziv encoding and the role of the dictionary in

encoding and decoding. Lempel Ziv編碼之壓縮法

Describe the main idea behind the JPEG standard for

compressing still images.

Describe the main idea behind the MPEG standard for

compressing video and its relation to JPEG.

Describe the main idea behind the MP3 standard for

compressing audio.

After studying this chapter, students should be able to: Objectives 學習目標

2

15.3

LOSSLESS COMPRESSION

無損失壓縮

In lossless data compression, the integrity of the data is

preserved.無損失壓縮可保有資料完整度 The original

data and the data after compression and decompression are

exactly the same because, in these methods, the

compression and decompression algorithms are exact

inverses of each other壓縮與解壓縮是完全互相反向的:

no part of the data is lost in the process. Redundant data is

removed in compression and added during

decompression. 壓縮時被精簡的資料，在解壓縮時將被完整地附加回來。Lossless compression methods are

normally used when we cannot afford to lose any data.

15.4

Run-length encoding 可變長度編碼之壓縮法

Run-length encoding is probably the simplest method of

compression. It can be used to compress data made of any

combination of symbols. It does not need to know the

frequency of occurrence of symbols and can be very

efficient if data is represented as 0s and 1s.此壓縮法對0

、1資料表示很有效

The general idea behind this method is to replace

consecutive repeating occurrences of a symbol by one

occurrence of the symbol followed by the number of

occurrences.基本觀念:以一個數值取代連續的0(或1)符號

The method can be even more efficient if the data uses

only two symbols (for example 0 and 1) in its bit pattern

and one symbol is more frequent than the other.

3

15.5

Example: Run-length Encoding for Two Symbols

範例: 可變長度編碼對兩個符號之壓縮法

There are number of 0s before the first 1 符號1之前有幾個0

Original data原始資料有33個符號

Encoded data編碼後資料有16個符號

Compressed ratio壓縮比 = 33 / 16 = 2.0625

15.6

Huffman coding 霍夫曼編碼之壓縮法

Huffman coding assigns shorter codes to symbols that

occur more frequently and longer codes to those that

occur less frequently.對出現次數較多與較少符號者分別用短與長編碼 For example, imagine we have a text file

that uses only five characters (A, B, C, D, E). Before we

can assign bit patterns to each character, we assign each

character a weight based on its frequency of use. In this

example, assume that the frequency of the characters is as

shown in Table. 五個文字出現頻率次數如表所示

Character a b c d e

Frequency 480 50 90 250 40

4

15.7

Huffman coding霍夫曼編碼最佳壓縮法

第一步：每次找兩個最小值相加往上建置霍夫曼樹

第二步：從霍夫曼樹的根，由上而下，左右分支分別指定0與1。

第三步：確認各文字之編碼

15.8

Lempel Ziv encoding Lempel Ziv編碼之壓縮法

Lempel Ziv (LZ) encoding is an example of a category of

algorithms called dictionary-based encoding字典式的編碼.

The idea is to create a dictionary (a table) of strings used

during the communication session. If both the sender and

the receiver have a copy of the dictionary, then previously-

encountered strings can be substituted by their index in the

dictionary to reduce the amount of information transmitted.

5

15.9

Compression Lempel Ziv編碼之壓縮法

In this phase there are two concurrent events: building an

indexed dictionary建立索引字典 and compressing a

string of symbols壓縮字串符號. The algorithm extracts the

smallest substring that cannot be found in the dictionary

from the remaining uncompressed string. It then stores a

copy of this substring in the dictionary as a new entry

(ABBB) and assigns it an index value (6).

Compression occurs when the substring (ABB), except for

the last character (B), is replaced with the index found in the

dictionary. The process then inserts the index (4) and the

last character (B) of the substring into the compressed

string.

Example:

15.10

LOSSY COMPRESSION METHODS

Our eyes and ears cannot distinguish subtle changes. In

such cases, we can use a lossy data compression method.

These methods are cheaper — they take less time

and space when it comes to sending millions of bits per

second for images and video. Several methods have

been developed using lossy compression techniques.

JPEG (Joint Photographic Experts Group) encoding

is used to compress pictures and graphics, MPEG

(Moving Picture Experts Group) encoding is used to

compress video, and MP3 (MPEG audio layer 3) for

audio compression.

6

Review Questions

Please explain the major differences between lossless

compression and lossy compression.

Please determine the compressed ratio using run-length

encoding with 4-bit code as follows.

1000000000011000000001110000000000000011

Please determine the compressed ratio using Huffman

encoding as follows.

Character a b c d e

Frequency 480 50 90 250 40

15.11

data compression 資料壓縮 - 南華大學chun/r-cs-chap15-data compression.pdf · 2018-12-05 ·...

Documents