data compression 資料壓縮 - 南華大學chun/r-cs-chap15-data compression.pdf · 2018-12-05 ·...
TRANSCRIPT
1
15.1 Source: Foundations of Computer Science Cengage Learning
Chapter 15
Data Compression
資料壓縮
15.2
Distinguish between lossless and lossy compression.能區分無損失與有損失壓縮
Describe run-length encoding and how it achieves compression.
可變長度編碼之壓縮法
Describe Huffman coding and how it achieves compression.霍夫曼編碼之壓縮法
Describe Lempel Ziv encoding and the role of the dictionary in
encoding and decoding. Lempel Ziv編碼之壓縮法
Describe the main idea behind the JPEG standard for
compressing still images.
Describe the main idea behind the MPEG standard for
compressing video and its relation to JPEG.
Describe the main idea behind the MP3 standard for
compressing audio.
After studying this chapter, students should be able to: Objectives 學習目標
2
15.3
LOSSLESS COMPRESSION
無損失壓縮
In lossless data compression, the integrity of the data is
preserved.無損失壓縮可保有資料完整度 The original
data and the data after compression and decompression are
exactly the same because, in these methods, the
compression and decompression algorithms are exact
inverses of each other壓縮與解壓縮是完全互相反向的:
no part of the data is lost in the process. Redundant data is
removed in compression and added during
decompression. 壓縮時被精簡的資料,在解壓縮時將被完整地附加回來。Lossless compression methods are
normally used when we cannot afford to lose any data.
15.4
Run-length encoding 可變長度編碼之壓縮法
Run-length encoding is probably the simplest method of
compression. It can be used to compress data made of any
combination of symbols. It does not need to know the
frequency of occurrence of symbols and can be very
efficient if data is represented as 0s and 1s.此壓縮法對0
、1資料表示很有效
The general idea behind this method is to replace
consecutive repeating occurrences of a symbol by one
occurrence of the symbol followed by the number of
occurrences.基本觀念:以一個數值取代連續的0(或1)符號
The method can be even more efficient if the data uses
only two symbols (for example 0 and 1) in its bit pattern
and one symbol is more frequent than the other.
3
15.5
Example: Run-length Encoding for Two Symbols
範例: 可變長度編碼對兩個符號之壓縮法
There are number of 0s before the first 1 符號1之前有幾個0
Original data原始資料有33個符號
Encoded data編碼後資料有16個符號
Compressed ratio壓縮比 = 33 / 16 = 2.0625
15.6
Huffman coding 霍夫曼編碼之壓縮法
Huffman coding assigns shorter codes to symbols that
occur more frequently and longer codes to those that
occur less frequently.對出現次數較多與較少符號者分別用短與長編碼 For example, imagine we have a text file
that uses only five characters (A, B, C, D, E). Before we
can assign bit patterns to each character, we assign each
character a weight based on its frequency of use. In this
example, assume that the frequency of the characters is as
shown in Table. 五個文字出現頻率次數如表所示
Character a b c d e
Frequency 480 50 90 250 40
4
15.7
Huffman coding霍夫曼編碼最佳壓縮法
第一步:每次找兩個最小值相加往上建置霍夫曼樹
第二步:從霍夫曼樹的根,由上而下,左右分支分別指定0與1。
第三步:確認各文字之編碼
15.8
Lempel Ziv encoding Lempel Ziv編碼之壓縮法
Lempel Ziv (LZ) encoding is an example of a category of
algorithms called dictionary-based encoding字典式的編碼.
The idea is to create a dictionary (a table) of strings used
during the communication session. If both the sender and
the receiver have a copy of the dictionary, then previously-
encountered strings can be substituted by their index in the
dictionary to reduce the amount of information transmitted.
5
15.9
Compression Lempel Ziv編碼之壓縮法
In this phase there are two concurrent events: building an
indexed dictionary建立索引字典 and compressing a
string of symbols壓縮字串符號. The algorithm extracts the
smallest substring that cannot be found in the dictionary
from the remaining uncompressed string. It then stores a
copy of this substring in the dictionary as a new entry
(ABBB) and assigns it an index value (6).
Compression occurs when the substring (ABB), except for
the last character (B), is replaced with the index found in the
dictionary. The process then inserts the index (4) and the
last character (B) of the substring into the compressed
string.
Example:
15.10
LOSSY COMPRESSION METHODS
Our eyes and ears cannot distinguish subtle changes. In
such cases, we can use a lossy data compression method.
These methods are cheaper — they take less time
and space when it comes to sending millions of bits per
second for images and video. Several methods have
been developed using lossy compression techniques.
JPEG (Joint Photographic Experts Group) encoding
is used to compress pictures and graphics, MPEG
(Moving Picture Experts Group) encoding is used to
compress video, and MP3 (MPEG audio layer 3) for
audio compression.
6
Review Questions
Please explain the major differences between lossless
compression and lossy compression.
Please determine the compressed ratio using run-length
encoding with 4-bit code as follows.
1000000000011000000001110000000000000011
Please determine the compressed ratio using Huffman
encoding as follows.
Character a b c d e
Frequency 480 50 90 250 40
15.11