formatting and source codingcontents.kocw.net/kocw/document/2014/pusan/kimjongdeok/8.pdf ·...
TRANSCRIPT
강의의 목표
문자, 음성, 이미지 등의 Information Digital Data Formatting 하
는 주요 기법을 이해한다.
Code, Encoding/Decoding, CODEC
한글 코드 / PCM Modulation
대역폭과 정보 표현과의 관계를 이해한다.
Data를 효율적으로 표현하기 위한 Source Coding
Compressor/Decompressor, CODEC
멀티미디어 데이터의 크기
Digital Audio / Digital Video 압축
2
Digital Info. Digital Data
Coding Schemes
Encoding/Decoding, CODEC
Alphabet, Digits and other characters…
ASCII, EBCDIC, …
MIDI
Musical Instrument Digital Interface
음악과 관련한 정보인데 Digital Info. ?
한글 코드?
완성형, 조합형 ?
KSC-5601, UNICODE
3
한글 코드 (ANSI Code ? / UNICODE)
>> fid=fopen(‘song.txt’, ‘r’);
>> ansi_string = fread(fid);
>> fclose(fid);
>> uni_string = native2unicode(ansi_string);
>> unicode_start_code=[255, 254];
>> fid2=fopen('uni_song.txt', 'w');
>> fwrite(fid2, unicode_start_code, 'uint8');
>> fwrite(fid2, uni_string, 'uint16');
>> fclose(fid2);
4
Data Acquisition System
Data Acquisition H/W
At the heart of any data acquisition system lies the data acquisition hardware. The main function of this hardware is to
convert analog signals to digital signals, and to convert digital signals to analog signals. (ADC / DAC)
Sensor and Actuators (Transducers)
Sensors and actuators can both be transducers. A transducer is a device that converts input energy of one form into
output energy of another form. For example, a microphone is a sensor that converts sound energy (in the form of
pressure) into electrical energy, while a loudspeaker is an actuator that converts electrical energy into sound energy.
Signal Conditioning H/W
Sensor signals are often incompatible with data acquisition hardware. To overcome this incompatibility, the signal
must be conditioned. For example, you might need to condition an input signal by amplifying it or by removing
unwanted frequency components. Output signals might need conditioning as well.
Physical Phenomena
Sensor
Actuator
Signal Conditioning
Acquisition H/W
Computer
5
Analog Info. Digital Data
소리
PCM (Pulse Coded Modulation)
Sampling Rate, Bits/Sample, Channel
이미지
Pixel, RGB, Bits/Pixel
VGA (640*480), QVGA, CIF(352*288), QCIF
동영상
Frame, 24/30 FPS
480P, 720P, 1080i, 1080P …
6
Pulse Code Modulation (PCM)
Nyquist Sampling Theory
If a signal is sampled at regular intervals at a rate higher than twice the highest signal freque
ncy, the samples contain all the information of the original signal
Ex) Voice data limited to below 4000Hz Require 8000 sample per second
Analog samples : Pulse Amplitude Modulation (PAM)
Each sample assigned digital value - Quantization
Quantizing error or noise
Approximations mean it is impossible to recover original exactly
Ex) 8 bit sample gives 256 levels
• 8000 samples per second of 8 bits each gives 64kbps
7
멀티미디어 정보의 크기
48Khz, 16bits/Sample, Stereo (2 Channel) Digital Audio를 1시간 동
안 녹음할 경우 발생하는 정보의 양은?
CD-ROM 에 기록할 수 있는 정보의 양?
VGA, 16bits/Pixel, 30FPS 비압축 동영상의 초당 정보 발생량은?
1시간 동안 녹음할 경우 발생하는 정보의 양은 ?
고화질 멀티미디어 방송?
5.1 Channel, 720P, 1080i, 1080P
DVD, BlueRay, HD-DVD
압축은 필수 요소
9
Audio 압축 기술
Digital Speech Coding
낮은 전송 자원 소모가 주목적으로 높은 압축율이 중요
Human Vocal System의 특성을 활용, Vocoder
핵심 알고리즘 및 기술 – LPC (Linear Predictive Coding) & CELP (Code Exited
Linear Prediction)
주요 압축 표준 – AMR(Adaptive Multi-Rate), G722, G723.1, G726, G728,
G729…
Digital Audio Coding
높은 압축율이 좋지만 좋은 음질을 재생해낼 수 있는 것이 중요
Human Auditory System의 특성을 활용 – Psychoacoustic Model
핵심 요소 – Hearing Sensitivity, Frequency Masking, Temporal Masking
주요 압축 표준 – MPEG-1 Audio Layers (1, 2, 3), Dolby AC3, MPEG-2
Advanced Audio Coding (AAC), MPEG-4 AAC (HE-AAC), …
11
Multimedia compression and container formats (wiki)
Video
ISO/IECMJPEG · Motion JPEG 2000 · MPEG-1 · MPEG-2 (Part 2) · MPEG-4 (Part 2/ASP · Part 10/AVC) · HEVC
ITU-T H.120 · H.261 · H.262 · H.263 · H.264 · HEVC
othersAVS · Bink · CineForm · Cinepak · Dirac · DV · Indeo · Microsoft Video 1 · OMS Video · Pixlet · RealVideo ·RTVideo · SheerVideo · Smacker · Sorenson Video & Sorenson Spark · Theora · VC-1 · VC-2 · VC-3 · VP3 ·VP6 · VP7 · VP8 · WMV
Audio
ISO/IECMPEG-1 Layer III (MP3) · MPEG-1 Layer II (Multichannel) · MPEG-1 Layer I · AAC · HE-AAC · MPEG Surround ·MPEG-4 ALS · MPEG-4 SLS · MPEG-4 DST · MPEG-4 HVXC · MPEG-4 CELP
ITU-T G.711 · G.718 · G.719 · G.722 · G.722.1 · G.722.2 · G.723 · G.723.1 · G.726 · G.728 · G.729 · G.729.1
othersAC-3, AMR, AMR-WB, AMR-WB+, Apple Lossless, ATRAC, CELT, DRA, DTS, EVRC, EVRC-B, FLAC, GSM-HR, GSM-FR, GSM-EFR, iLBC, iSAC, Monkey's Audio, TTA (True Audio), MT9, A-law, μ-law, Musepack, Nellymoser, OptimFROG, OSQ, QCELP, RealAudio, RTAudio, SD2, SHN, SILK, Siren, SMV, Speex, SVOPC, TwinVQ, VMR-WB, Vorbis, WavPack, WMA
Image
ISO/IEC/ITU-T JPEG · JPEG 2000 · JPEG XR · lossless JPEG · JBIG · JBIG2 · PNG · TIFF/EP · TIFF/IT
others APNG · BMP · DjVu · EXR · GIF · ICER · ILBM · MNG · PCX · PGF · TGA · QTVR · TIFF · WBMP · WebP
Cont-ainer
ISO/IECMPEG-PS · MPEG-TS · ISO base media file format · MPEG-4 Part 14 · Motion JPEG 2000 · MPEG-21 Part 9
ITU-T H.222.0 · T.802
others3GP and 3G2 · AMV · ASF · AIFF · AVI · AU · Bink · DivX Media Format · DPX · EVO · Flash Video · GXF ·M2TS · Matroska · MXF · Ogg · QuickTime File Format · RealMedia · REDCODE RAW · RIFF · Smacker ·MOD and TOD · VOB · WAV · WebM
12
Digital Speech Coding
In relation to the opening and closing vibrations of the vocal cords
as air blows over them, speech signals can be roughly categorized
into two types of signals: voiced speech and unvoiced speech.
The Human Speech Production System
13
Linear Predictive Coding
A speech signal s(n) can be approximated as an auto-regressive (AR)
formulation
The coefficients {𝑎𝑘} are derived on the basis of a 20~30ms block of data (frame)
𝑠 𝑛 = 𝑒 𝑛 +
𝑘=1
𝑝
𝑎𝑘𝑠(𝑛 − 𝑘)
15
Digital Audio Coding – Auditory System
1) The outer ear directs sounds through the ear canal towards the eardrum2) The middle ear transforms sound pressure waves into mechanical movement on three small bones called “ossicles”
(the hammer, anvil, and stirrup)3) The inner ear houses the cochlea, a spiral-shaped structure for human hearing which sits in an extremely sensitive
membrane called the basilar membrane. The cochlea converts the middle ear’s mechanical movements to basilar membrane movement and eventually into the firing of auditory neurons, which, in turn send electrical signals to the brain
16
Hearing Sensitivity의 활용
If we uniformly quantize each audio sample with 12 bits, the resulting quantization noise can be as low as -26 dB, which is far below the threshold of hearing.
We can divide the audible frequency range (20Hz to 20Khz) into several bands, and the audio sample in different bands can be quantized with different numbers of bits to accommodate different tolerances of quantization noise.
18
Temporal Masking
A weak sound emitted soon after the end of a louder sound is masked by the louder sound. (Post-masking)
Even a weak sound just before a louder sound can be masked by the louder sound. (Pre-masking)
The combined frequency and temporal masking effect
21
Frequency Domain Analysis ?
앞서 살펴본 Digital Speech/Audio Coding 기술 적용을 위해서는
Audio 신호에 대한 스펙트럼(주파수) 분석이 필요
Fourier Analysis
22
Digital Audio Standards
MPEG-1 Audio Layer I, II, III
Layer I : MP1
• one of three audio formats included in the MPEG-1 standard. While supported by most media
players, the codec is considered largely outdated, and replaced by MP2 or MP3.
Layer II : MP2, (sometimes incorrectly called MUSICAM)
• While MP3 is much more popular for PC and internet applications, MP2 remains a dominant
standard for audio broadcasting.
• 우리의 지상파 DMB (T-DMB)의 원조라고 할 수 있는
Eureka-147이라 불리는 DAB(Digital Audio Broadcasting)의 기본 Audio Codec
• 유럽의 DTV 표준인 DVB(Digital Video Broadcasting)의 기본 Audio Codec
• MPEG-2 Audio Layer II extension을 통해 Multi-Channel을 지원
Layer III : MP3
• a patented digital audio encoding format using a form of lossy data compression. It is a common
audio format for consumer audio storage, as well as a de facto standard of digital audio
compression for the transfer and playback of music on digital audio players.
23
Digital Audio Standards
Dolby AC3 Audio Codec
Multi-Channel Support
http://en.wikipedia.org/wiki/Dolby_AC3
24
Digital Audio Standards
Advanced Audio Coding (AAC)
Designed to be the successor of the MP3 format, AAC generally achieves better sound
quality than MP3 at similar bit rates
• AAC has been standardized by ISO and IEC, as part of the MPEG-2 and MPEG-4 specifications.
Part of the AAC known as High-Efficiency Advanced Audio Coding (HE-AAC) which is part of
MPEG-4 Audio is also adopted into digital radio standards like DAB+ and Digital Radio Mondiale,
as well as mobile television standards DVB-H and ATSC-M/H.
• AAC supports inclusion of 48 full-bandwidth (up to 96 kHz) audio channels in one stream plus 16
low frequency effects (LFE, limited to 120 Hz) channels, up to 16 "coupling" or dialog channels,
and up to 16 data streams. The quality for stereo is satisfactory to modest requirements at 96
kbit/s in joint stereo mode; however, hi-fi transparency demands data rates of at least 128 kbit/s
(VBR). The MPEG-2 audio tests showed that AAC meets the requirements referred to as
"transparent" for the ITU at 128 kbit/s for stereo, and 320 kbit/s for 5.1 audio.
• AAC is also the default or standard audio format for iPhone, iPod, iPad, Nintendo DSi, iTunes,
DivX Plus Web Player and PlayStation 3. It is supported on PlayStation Portable, Wii (with the
Photo Channel 1.1 update installed for Wii consoles purchased before late 2007), Sony Walkman
MP3 series and later, mobile phones made by Sony Ericsson and Nokia and Android-based mobile
phones.
25
RGB & YUV
YUV
The YUV model defines a color space in terms of one luma (Y) and two chrominance (UV
) components. The YUV color model is used in the PAL, NTSC, and SECAM composite c
olor video standards. Previous black-and-white systems used only luma (Y) information a
nd color information (U and V) was added so that a black-and-white receiver would still be
able to display a color picture as a normal black and white picture.
YUV models human perception of color in a different way from the standard RGB model u
sed in computer graphics hardware.
Y stands for the luma component (the brightness) and U and V are the chrominance (colo
r) components. The YPbPr color model used in analog component video and its digital ver
sion YCbCr used in digital video are more or less derived from it (Cb/Pb and Cr/Pr are de
viations from grey on blue-yellow and red-cyan axes, whereas U and V are blue-luminanc
e and red-luminance differences), and are sometimes inaccurately called "YUV". The YIQ
color space used in the analog NTSC television broadcasting system is related to it, altho
ugh in a more complex way.
28
컬러 공간(Color space)
YUV
명도(휘도)와 채도로 나타낸 색상계
Y : 명도 (Luminance)
U : 채도( 청색 계열 : Y – B )
V : 채도 (적색 계열 : Y – R )
RGB ↔ YUV 변환가능
사용 이유
명도에 좀 더 중점을 두기 위하여
• 사람의 눈은 색상보다 밝기에 민감
Subsampling
• 명도(Y)는 유지시키고 색깔정보(U, V)의
정보량을 줄임
29
Subsampling Subsampling
Y, U, V의 비율을 다르게 해서 추출하는 방식
4:4:4 샘플링 방식은 비손실 압축
4:2:2 (카메라), 4:2:0 (다양한 압축기술)
4:2:0의 경우 기존 4:4:4보다 50% 압축
1 2 3 4
65 7 8
1 2 3
65 7 8
1 2 3 4
6 7 8
4:4:4
5
Y
U
V4
24 samples
1 2 3 4
1 2 3 4
4:2:2
1 2
3 4Y
U
V
16 samples
22 11
4:2:0
1 2Y
U V
12 samples
30
영상 압축(Video Compression)의 방법
영상압축의
방법
공간적 압축
(Spatial
Model)
확률적 압축
(Entropy
Model)
시간적 압축
(Temporal
Model)
31
공간적 압축 (Spatial model)
공간적 압축(Spatial model)
공간주파수(Spatial frequency)
• 공간에서의 색이나 구조의 변화
DCT(Discrete Cosine transform)
• 화소 값 -> 공간주파수
• 푸리에 변환과 유사한 변환
• 일반 영상의 경우, DCT의 값들이 저주파 쪽으로 몰리는 성질
공간주파수가 낮다 공간주파수가 높다
Image Block DCT Coefficient Matrix
DCT
32
Discrete Cosine Transform
For the reduction of spatial redundancy
convert the spatial representation of an 8*8 image to the
frequency domain
Similar to FFT
otherwise
xxC
jyixjiDCTjCiCyxpixel
jyixyxpixeljCiCjiDCT
i j
x y
1
0 2
1
)(
where
]16
)12(cos[]
16
)12(cos[),()()(
4
1),(
]16
)12(cos[]
16
)12(cos[),()()(
4
1),(
7
0
7
0
7
0
7
0
33
Example88 Source Image Block DCT Coefficient Matrix
DCT
Quantization Table
Quantized Coefficient Matrix
Quantization
ZigZag Scanning & RLE
34
시간적 압축(Temporal model)
시간적 압축(Spatial model)
시간적 중복 (Temporal Redundancy)
• 텔레비전 : 약 30 fps (frame per second) , 영화 : 약 24 fps
• 사물의 움직임에 비해 1 frame당 시간은 매우 짧음
• 따라서 영상에서는 시간적 중복이 많이 일어남
35
시간적 압축(Temporal model)
시간적 압축(Spatial model)
움직임 예측(Motion Estimation)
• 현재의 블록을 과거의 프레임에서 찾는 과정
움직임 보상(Motion Compensation)
http://en.wikipedia.org/wiki/Motion_compensation
• 움직임 벡터 (Motion Vector) 를 구하는 과정
과 거 현 재
36
Group of Pictures
I frame
transformed without using prediction
restarting point for prediction
random access point
P frame
unidirectional prediction
B frame
bidirectional prediction
not used for predicting other frames
38
Group Of Picture1 2 3 4 5 6 7 8 9 10 11 12 13
Group Of Picture
PI B
1 5 2 3 4 9 6 7 8 13 10 11 12
Group Of Picture
PI B
재생 순서
코딩 순서 & 전송 순서
39
Representation - Entropy
The Concept of Entropy from Information Theory
For a given set of symbols, 𝐴 = 𝑎1, 𝑎2, . . . , 𝑎𝑁
Each symbol 𝑎𝑛 is associated with an event or an observation that has
occurrence probability 𝑝𝑛 separately; 𝑝𝑛 ∈ 𝑝1, 𝑝2, . . . , 𝑝𝑁
The Information measure 𝐼(𝑎𝑛) of the symbol 𝑎𝑛 is defined as
𝐼(𝑎𝑛) = − log𝑏 𝑝𝑛 = −log𝑏1
𝑝𝑛
The average amount (expected value) of information we can get from each
symbol emitted in the stream from the source is defined as the entropy 𝐻(𝐴)
for the discrete set of probabilities 𝑃 ∈ 𝑝1, 𝑝2, . . . , 𝑝𝑁 :
𝐻 𝐴 = 𝐼 𝐴 =
𝑛=1
𝑁
𝑝𝑛 log𝑏1
𝑝𝑛
http://en.wikipedia.org/wiki/Information_entropy
40
Entropy Coding
Entropy Coding
A coding scheme that assigns codes to symbols so as to match code lengths
with the probabilities of the symbol.
The more frequently, the shorter codeword
• According to Shannon’s source coding theorem, the optimal code length for a
symbol is log𝑏1
𝑝; p is the probability of the input symbol
Example: Huffman Coding, Lempel-Zip Coding
ex) 2 bits per sample -> 1.6 bits per sample
Run-Length Encoding
ex) 000000001122222 ==> (0;8)(1;2)(2;5)
Input Codeword Frequency (Prob.) Output Codeword
00 0.6 0
01 0.15 100
10 0.2 11
11 0.05 101
41
The GNU Software Radio
http://www.gnuradio.org
GNU Radio is a free & open-source software development toolkit that
provides signal processing blocks to implement software radios. It can be
used with readily-available low-cost external RF hardware to create software-
defined radios, or without hardware in a simulation-like environment. It is
widely used in hobbyist, academic and commercial environments to support
both wireless communications research and real-world radio systems.
Hardware - USRP
The Universal Software Radio Peripheral is the recommended device for
interfacing GNU Radio with the real world. The USRP has been developed
especially for GNU Radio, and is available from Ettus Research.
43
Exploring GNU Radio
http://www.gnu.org/software/gnuradio/doc/exploring-
gnuradio.html
44
Listening to FM Radio using GNU Radio
http://www.linuxjournal.com/article/7505
Daughter Board (TVRX 50Mhz to 870Mhz Receiver)
Bandpass signal with 6Mhz bandwidth at IF (Intermediate Frequency) 5.75Mhz.
ADC
Up to 64M samples per seconds, 12 bits/sample
FPGA
Digital Down Converter
46
(1024) Samples from ADC
FFT of FM Bands
10MHz
FFT at Output of DDC
FFT of Demodulated FM Signal
19KHz : Stereo Pilot tone
38KHz
47
Listening to FM Radio using GNU Radio
Angle Modulation (Phase Modulation, Frequency Modulation)
𝑠 𝑡 = 𝐴𝑐 ⋅ cos[2𝜋𝑓𝑐𝑡 + 𝜙 𝑡 ]
PM : 𝜙 𝑡 = 𝑘 ⋅ 𝑚(𝑡)
FM : 𝜙′ 𝑡 = 𝑘 ⋅ 𝑚(𝑡)
Instantaneous frequency
48
Listening to FM Radio using GNU Radio
𝑠 𝑡 = 𝐴𝑐 ⋅ cos[2𝜋𝑓𝑐𝑡 + 𝜙 𝑡 ]에서 𝜙′ 𝑡 = 𝑘 ⋅ 𝑚(𝑡) 추출하기
Digital Down Converter
IF (Intermediate Frequency) Baseband 로; 2𝜋𝑓𝑐 없애기
FPGA에서 수행
cos[2𝜋𝑓𝑐𝑡 + 𝜙 𝑡 ] ⋅ cos 2𝜋𝑓𝑐𝑡 =1
2⋅ (cos 4𝜋𝑓𝑐𝑡 + 𝜙 𝑡 + cos 𝜙 𝑡 )
Quadrature Demodulator
Differential, Difference ?
)( 1tie
)( 2tie
)()())()(( 1212 titittieee
49
GNU Radio Applications
In addition to the examples discussed above, GNU Radio comes with a complete HDTV transmitter and
receiver, a spectrum analyzer, an oscilloscope, concurrent multichannel receiver and an ever-growing
collection of modulators and demodulators.
Projects under investigation or in progress include:
A TiVo equivalent for radio, capable of recording multiple stations simultaneously.
Time Division Multiple Access (TDMA) waveforms.
A passive radar system that takes advantage of broadcast TV for its signal source
TETRA transceiver.
Digital Radio Mundial (DRM).
Software GPS.
Distributed sensor networks.
Distributed measurement of spectrum utilization.
Amateur radio transceivers.
Ad hoc mesh networks.
RFID detector/reader.
Multiple input multiple output (MIMO) processing.
50
Visible Light Communication
가시광 통신
http://blog.skbroadband.com/938
http://www.disneyresearch.com/project/visible-light-communication/
51
Communication over Screen-Camera Links?
2D barcodes are everywhere !!!
“Transmitting” information (vs linking)
ReceiverTransmitter
Original frame Single frame 2-frame mix
Mixing pattern varies by line
52
Acoustic Communication / Soundcode
Acoustic Communication ?
자연계에서 일반적으로 쓰이는 전통적 통신 방법
기술적 가치 ? 수중 통신 (Underwater Communication)
스마트 폰과 연계? - http://digxtal.egloos.com/v/2654784
2 Approaches
Sonic Notify inserts ultra-high frequency sounds to the carrier audio. These frequencies
are beyond the hearing range of most people and thus people just perceive it as if there
were no alterations. https://sonicnotify.com/
Intrasonics modifies the carrier audio and adds artificial echoes to it. The human brain
perceives these as natural echoes and just ignores them as if there are a few insignificant
objects that bounces the original sound.
53