9lghr&rglqj - informationskodning the temporal and the spatial domain! ... – bilinear...
TRANSCRIPT
2XWOLQH
I. Colour coding
II. Moving images: From 2D to 3D?
III. Hybrid coding
IV. Video coding standards
3DUW�,�&RORXU�&RGLQJ
The base colours of colour television are
– Red: 700 nm
– Green: 546 nm
– Blue: 435 nm
Three base colours enough tosynthesize any visible colour!
7KH�3$/�FRORXUV
Y = 0.30B + 0.59G + 0.11B
Cr = 0.70R - 0.59G - 0.11B
Cb = - 0.30R - 0.59G + 0.89B
Y luminance; Cr, Cb chrominance
Matrix
R
G
B
Y
R-Y
B-Y
z Change basis to YUV (almost the same as YCrCb).
– For more info on color spaces, see colour FAQ at www.poynton.com/Poynton-color.html
z The Human Visual System perceives the luminance in higher resolution than the chrominance!
Æ Subsample the colour components.
'LJLWDO�&RORXU�&RGLQJ
YU V
4:2:0
Y U V
4:2:2
3DUW�,,�&RGLQJ�RI�0RYLQJ�,PDJHV
Principle I - Extend known methods to 3D
LowVery high0.1 - 0.5Fractal
HighHigh0.1 – 1.0Subband/Wavelet
HighHigh0.5 – 1.5Transform
LowLow2 – 5Predictive
LowVery high0.5 – 2VQ
LowLow6 – 8PCM
Decoding complexity
ComplexityPrestanda (bpp)Coding Method
([WHQGLQJ��'�0HWKRGV
z Predictive coding
– 3D predictors
– Motion compensated predictors
z Transform coding
– 3D transforms
z Subband coding
– 3D subband filters
BUT! The properties of the image signal are different in the temporal and the spatial domain!
3DUW�,,,�+\EULG�&RGLQJ
z Combine predictive coding and transform coding.
z Use predictive coding to predict the next frame in the sequence.
z Use transform coding to code the prediction error.
)UDPH�3UHGLFWLRQ
Intra-codedI-frame
Predictivelycoded
P-frames
Better prediction if it can compensate for motion!
0RWLRQ�&RPSHQVDWHG�+\EULG�&RGLQJ
VLCME
ME: Motion estimation
TQ-1
TQ
P
VLC
TQ: Transform+ quantization
0RWLRQ�&RPSHQVDWLRQ
z Typically one motion vector per macroblock (4 transform blocks)
z Motion estimation is a time consuming process
– Hierarchical motion estimation
– Maximum length of motion vectors
– Clever search strategies
z Motion vector accuracy:
– Integer, half or quarter pixel
– Bilinear interpolation
3DUW�,9�9LGHR�&RGLQJ�6WDQGDUGV
8 16 64 384 1.5 5 20
kbit/s Mbit/s
Very low bitrate Low bitrate Medium bitrate High bitrate
Mobilevideophone
Videophoneover PSTN
ISDNvideophone
Digital TV HDTVVideo CD
MPEG-4 MPEG-1 MPEG-2H.261H.263
6WDQGDUGV
z H.26x
– Standards for real time communication like video telephony and video conferencing.
– Standardized by ITU.
z MPEG
– Standards for stored video data like movies on CDs, DVDs, etc.
– Standardized by ISO.
+����
z Standard for ISDN picture phones in 1990.
z Motion compensation:
– One motion vector per macroblock.
– One macroblock = four 8e8 luminance blocks + two chrominance blocks (one U and one V).
– Motion vectors max 15 pixels long in each direction.
z Format:
– CIF (352e288) or QCIF (176e144)
– 7.5 – 30 frames/s.
z Bitrate: Multiple of 64 kbit/s (=ISDN) including audio.
z Quality: Acceptable for small motion at 128 kbit/s.
+����
z Standard for picture telephones over analog subscriber lines in 1995.
z Format:
– CIF, QCIF or Sub-QCIF.
– Usually less than 10 frames/s.
z Bitrate: Typically 20 – 30 kbit/s.
z Quality: With new options as good as H.261 (at half the bitrate).
03(*
z Moving Pictures Expert Group – a committee under ISO and IEC.
z Original plan:
– MPEG-1 for 1.5 Mbit/s (VideoCD)
– MPEG-2 for 10 Mbit/s (Digital TV)
– MPEG-3 for 40 Mbit/s (HDTV)
z What happened:
– MPEG-1 for 1.5 Mbit/s (Video CD)
– MPEG-2 for 2 – 60 Mbit/s (TV and HDTV)
– MPEG-4, -7 and -21 for other things.
03(*��
z ISO/IEC standard in 1991.
z Target bitrate around 1.5 Mbit/s (Video CD).
z Properties:– Bi-directionally predictively coded frames (”B-frames”, see next
slide).
– More flexible than H.261.
– Almost JPEG for intra frames.
z Format:– CIF
– No interlace.
– 24 – 30 frames/s.
03(*�)UDPH�7\SHV
I B PB B PB B PB B IB
Intra-codedI-frame
Predictivelycoded
P-frames
Bi-directionallypredictively
codedB-framesGroup of frames (GOF)
03(*�FRGLQJ�RI�,�IUDPHV
z Intracoded
z 8e8 DCT
z Arbitrary weighting matrix for coefficients
z Predictive coding of DC-coefficients
z Uniform quantization
z Zig-zag, run-level, entropy coding
03(*�FRGLQJ�RI�3�IUDPHV
z Motion compensated prediction from I- or P-frame.
z Half-pixel accuracy of motion vectors, bilinear interpolation.
z Predictive coding of motion vectors.
z Prediction error coded as I-frame.
03(*�FRGLQJ�RI�%�IUDPHV
z Motion compensated prediction from two consecutive I-or P-frames.
– Forward prediction only (1 vector/macroblock).
– Backward prediction only (1 vector/macroblock).
– Average of fwd and bwd (2 vectors/macroblock).
z Otherwise as P-frames.
03(*��
z ISO/IEC standard in 1994.
z Properties:
– Handles interlace (optimized for TV)
– Even more flexible than MPEG-1
z Format:
– 352e288
– 704e576 (25 frames/s) or 720e480 (30 frames/s)
– 1440e1152 or 1920e1080 (HDTV)
z Bitrate:
– 2 – 60 Mbit/s
– ~4 Mbits/s: Image quality similar to PAL / NTSC / SECAM.
– 18 – 20 Mbit/s: HDTV.
03(*����FRQW��
z Profiles:
– Simple profile without B-frames.
– Scaleable profiles.
z Experience tells that:
– At 1.5 – 2 Mbit/s MPEG-2 is not better than MPEG-1.
– With manual interaction at the coding, good quality can be achieved at 3 – 4 Mbit/s.
– Problems with implementing the full standard has caused compatibility problems.
– Buffering and rate control hard problems.
03(*��
z ISO/IEC standard in 1998, version 2 in 1999
z Instead of frames as coding units, MPEG-4 use audio-visual objects
z Focus is not primarily on compression, but on content-based functionality
z Contains definitions of:
– Media object types (video, audio, text, graphics, ...)
– Parameters for describing the objects
– Bitstream syntax for the (compressed) parameters
– Scene description, file format, streaming, synchronization, ...
z Allows mixing of media objects.
3DUWV�RI�WKH�03(*���VWDQGDUG
z Part 1, Systems, contains
– The bitstream syntax and the the binary ”language” for scene description
– Computer graphics object descriptions
– Multiplexing, transport, ...
z Part 2, Visual, contains
– Video coding
– Still image coding
– Texture coding, ...
z Part 3, Audio, contains a toolbox of audio coders for different applications
z ...
6WUXFWXUH�RI�DQ�03(*���'HFRGHU
$�9REMHFW
'HFRGHU
08;
&RPSRVLWR
U
%LWVWUHDP $XGLR�9LGHR�VFHQH
$�9REMHFW
'HFRGHU
$�9REMHFW
'HFRGHU
$�YLGHR�IUDPH
%DFNJURXQG�923
923
923
03(*����1DWXUDO��9LGHR
z Instead of frames: Video Object Planes
z Coded with Shape Adaptive DCT
$OSKD�PDS
6$�'&7
74��7UDQVIRUP��TXDQWL]DWLRQ
TQ-1
TQ VLC
3UHGLFWRU
03(*���9LGHR�&RGLQJ
0RWLRQHVWLPDWLRQ
Mux
VLC
VLC6KDSHFRGLQJ
6\QWKHWLF�1DWXUDO�+\EULG�&RGLQJ
z Mix traditional video with 2D/3D graphics
– Compose virtual environments
– Easy to add text, graphs, images, etc
z High compression
z Receive object from separate sources
– Use predefined or locally defined objects
z Scaleability
– Progressive decoding
– Better terminal gives better quality.
6\QWKHWLF�2EMHFWV
z 2D/3D graphics
– Lines, polygons
– Still images
– Image/video mapping on polygon meshes
z VRML scenes and objects
z Animated people
z More on animation and virtual characters in Lecture 12!
z Synthetic audio
z More on natural and synthetic audio in Lecture 11!
&RPSXWHU�JUDSKLFV�JHQHUDWHGYLUWXDO�HQYLURQPHQW
1DWXUDO�YLGHR�REMHFW
1DWXUDO�YLGHR�REMHFWPDSSHG�RQ��'�PHVK
6WLOO�LPDJH�RU�QDWXUDO�YLGHR�REMHFWPDSSHG�RQ�DQLPDWHG��'�PHVK
$OO�PL[HG�LQWKH�GHFRGHU���
9LUWXDO�(QYLURQPHQWV
z Downloaded virtual environment
z Different environments for different users
z Simple change between environments
z Synthetic environments are cheaper than real ones
7RROV�IRU�6\QWKHWLF�2EMHFWV
z Wavelet-based still image compression
– Scaleable quality and resolution
– Progressive decoding
– Can be mapped on 2D or 3D meshes
z Compression of 2D and 3D meshes
– Mesh geometry and animation
– Transmit vertex coordinates and let the receiving terminal calculate the polygons
– A moving or still image can be mapped on the mesh (texture mapping).
0RUH�7RROV�IRU�6\QWKHWLF�2EMHFWV
z Face and Body Animation
z Text-to-speech (TTS) interface
z View-dependent scaleable texture
– Information about the users view position in a 3D scene is transmitted on a back-channel
– Only the necessary texture information is transmitted to the user
9LHZ�GHSHQGHQW�6FDOHDEOH�7H[WXUH
Original texture
The texture is mapped on a surface
What the user sees
2WKHU�IRUPDWV
z Microsoft, RealVideo, QuickTime, ...
z All are variations of the hybrid coder used in MPEG-coders, with some extra features.
+�������03(*���SDUW���
z 4e4 integer transform (approximating DCT).
z Prediction of blocks of sizes up to 16e16.
z Motion vectors for blocks of sizes 4e4 up to 16e16.
z Up to 5 reference images for prediction.
z Non-uniform qunatization.
z Arithmetic coding of run-level pairs.
:KDW�DERXW�WKH�VRXQG"
z MPEG-1
– Audio layer I, II and III (mp3).
z MPEG-2
– Four channels, same codec as in MPEG-1.
– AAC (Advanced Audio Codec) added later.
z MPEG-4
– AAC
– Two speech coders
– Structured audio
– And more...
More on audio codingin Lecture 11.
&RQFOXVLRQ
z Color coding
– Change basis from RGB to YUV
– Colour components are compressed harder than the luminance
z Moving image coding
– Hybrid coding: Motion compensated predictive coding and transform coding of the prediction error
– I-, P-, and B-frames
– Object-based coding (MPEG-4) mixing synthetic and natural audio & video
&RQFOXVLRQ��FRQW�
z Standards
– MPEG-1: Video CD
– MPEG-2: Digital TV
– MPEG-4: Multimedia
– H.261: ISDN videophone
– H.263: PSTN videophone
– H.264 / MPEG-4 part 10: Universal video