digital media and interactive...

35
January 8, 2014 Sam Siewert Digital Media and Interactive Systems Deeper Dive into MPEG Digital Video Encoding

Upload: nguyenquynh

Post on 09-Mar-2018

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

January 8, 2014 Sam Siewert

Digital Media and Interactive Systems

Deeper Dive into MPEG Digital Video Encoding

Page 2: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

MPEG Encode/Decode

Tools

Sam Siewert

2

Page 3: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

FFMPEG FAQ

Read It!! http://ffmpeg.org/faq.html You should know how to Decode Video (recorded from your camera or pre-recorded by someone else) You should know how to Encode Video (to turn in with your labs) On Ubuntu – do “apt-get install ffmpeg” to get it!

Sam Siewert 3

Page 4: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Ffmpeg (avconv) Notes sudo apt-get install ffmpeg ffmpeg -i movie.mpg –ss 30 –t 30 movie%d.ppm –- 30 seconds @ 30 sec

ssiewert@ssiewert-VirtualBox:~/a485/media$ ffmpeg -i big_buck_bunny_480p_surround-fix.avi -ss 30 -t 30 bbb%d.ppm ffmpeg version 0.8.6-4:0.8.6-0ubuntu0.12.04.1, Copyright (c) 2000-2013 the Libav developers built on Apr 2 2013 17:02:36 with gcc 4.6.3 Input #0, avi, from 'big_buck_bunny_480p_surround-fix.avi': Duration: 00:09:56.45, start: 0.000000, bitrate: 2957 kb/s Stream #0.0: Video: mpeg4 (Simple Profile), yuv420p, 854x480 [PAR 1:1 DAR 427:240], 24 tbr, 24 tbn, 24 tbc Stream #0.1: Audio: ac3, 48000 Hz, 5.1, s16, 448 kb/s Incompatible pixel format 'yuv420p' for codec 'ppm', auto-selecting format 'rgb24' [buffer @ 0x907700] w:854 h:480 pixfmt:yuv420p [avsink @ 0x9054c0] auto-inserting filter 'auto-inserted scaler 0' between the filter 'src' and the filter 'out' [scale @ 0x905b60] w:854 h:480 fmt:yuv420p -> w:854 h:480 fmt:rgb24 flags:0x4 Output #0, image2, to 'bbb%d.ppm': Metadata: encoder : Lavf53.21.1 Stream #0.0: Video: ppm, rgb24, 854x480 [PAR 1:1 DAR 427:240], q=2-31, 200 kb/s, 90k tbn, 24 tbc Stream mapping: Stream #0.0 -> #0.0 Press ctrl-c to stop encoding ... Last message repeated 719 times -0kB time=29.00 bitrate= -0.0kbits/s frame= 720 fps= 38 q=0.0 Lsize= -0kB time=30.00 bitrate= -0.0kbits/s video:864686kB audio:0kB global headers:0kB muxing overhead -100.000002% ssiewert@ssiewert-VirtualBox:~/a485/media$

Sam Siewert 4

Page 5: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Now with PPM Frames PPM is Simple, but No Compression – Good for CV – http://en.wikipedia.org/wiki/Netpbm_format - Read this! – JPEG, PNG are Compressed – TIFF is an Alternative, but More Complex

Sam Siewert 5

Page 6: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Simple Re-encode When Quality is not a Concern, Keep it Simple

ffmpeg -f image2 -i bbb%d.ppm bbbtrans.mpg vlc bbbtrans.mpg

Sam Siewert 6

Page 7: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Quality Encoding is Tricky Use MPEG4 HQ Settings, Encode 480p, AR=4:3 ffmpeg -f image2 -i bbb%d.ppm -maxrate 20000k -bufsize 32M -s 640x480 -vcodec mpeg4 -qscale 1 bbbtranshq.mp4

Sam Siewert 7

Page 8: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

MPEG Encode/Decode

Theory and Algorithms

Sam Siewert

8

Page 9: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Overview MPEG-2 Standards – 13818-1: Transport Streams for Video & Audio

Container for Program Streams (188 Byte Packets) Multiplexed Video and Audio Elementary Streams PSI – Program Specific Information System Clock (PCT, PTS/DTS)

– 13818-2: Elementary Video Stream Encode/Decode

Color Format Macro-blocks, Video DCT GoP (I-Frame, B-Frame, P-Frame) Motion Compensation and Vector Quantization

Differences Between MPEG-2 and MPEG-4

Sam Siewert 9

Page 10: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

MPEG-2: Order Of Operators

Sam Siewert 10

#1: POINT (Pixel) Encoding #2 A-C: Macro-Block Lossy Intra-Frame Compression #3: Motion-Based Compression in Group of Pictures

#1

#2A

#2B #2C #3

Page 11: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Sam Siewert 11

Step #1 – RGB to YCrCb 4:4:4 24-bit (Lossless)

For every Y sample in a scan-line, there is also one CrCb sample – Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits – No compression between RGB and YCrCb 4:4:4 (both 24 bits/pixel)

Typically a Post Production, CEDIA or DCI format

… 0 319

… 76,480 76,799

= Y, Cr, and Cb sample = Y sample only

Page 12: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

48 bit to 32 bit

Sam Siewert 12

Step #1 – RGB to YCrCb 4:2:2 (Lossy) For every 2 Y samples in a scan-line, one CrCb sample – Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits – Two RGB Pixels = 48 bits, Whereas Two YCrCb is 32 bits, or 16

bits per pixel vs. 24 bits per pixel (33% smaller frame size)

… 0 319

… 76,480 76,799

= Y, Cr, and Cb sample = Y sample only

Page 13: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Sam Siewert 13

Step #1 – RGB to YCrCb 4:2:0 (Lossy) For every 4 Y samples in a scan-line, one CrCb sample – Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits – Two RGB Pixes = 48 bits, Whereas Four YCrCb is 48 bits, or 12

bits per pixel on average vs. 24 bits per pixel (50% smaller)

… 0 319

… 76,480 76,799

= Cr, Cb sample = Y sample only

Page 14: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Step #2 – Convert to 8x8 Macroblocks and Transform

Aspect Ratios Designed to Fit 8x8 Macroblock E.g. 640 x 480 => 80 x 60 Macroblocks Discrete Cosine Transform Applied to Each 8x8 – Spatial Intensity to Frequency Transform – Applied on X Axis (Row) – Applied on Y Axis (Column)

Set up for Intra-frame (I-frame) Compression

Sam Siewert 14

Page 15: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Convolution Concepts Math operation on 2 functions, that produces a 3rd Point Spread Function “Sharpen” meets this Definition So do Many Mask Operations applied to Pixel Neighborhoods

Sam Siewert 15

2 impulses, f(t), g(X – t)

Area inside intersection

f convolved with g over t

Page 16: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

DCT – Discrete Cosine Transform Convolution of Image with Discrete Cosine See http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_code/example-dct1/ De-convolved to restore image from Convolved Image

Sam Siewert 16

DCT

Inverse DCT

Page 17: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

DCT Concepts F(x) is a sum of sinusoids (with frequency, amplitude) DCT operates of a discrete number of samples Can derive DC sum at any x, even where F(x) not known N x N Macro-block has Zero Frequency DC at 0,0 Increasing Horizontal Frequency Increasing Vertical Frequency Can De-convolve (inverse DCT, or iDCT) Can Eliminate High Frequency Horizontal and Vertical Terms – Minimal Losses from Truncation (otherwise lossless) – Loss of High Frequency Image Features (What are These?)

Sam Siewert 17

Page 18: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Basic Concept of Waveforms Complex Waveform is Sum of Simple Fundamentals Simple Fundamentals Can Be Derived from Complex

Sam Siewert 18

Page 19: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Scanline DCT Example Small Losses Due to DCT, iDCT Numerical Truncation Larger Losses Due to H.O.T. Quantization and Truncation http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_doc/1D-DCT-N-Fundamentals.xlsx

Sam Siewert 19

Page 20: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

What Is Lost with DCT Quantization? Noise More Than Anything Else Complex XY Variable Patterns (Real Science Data?)

Sam Siewert 20

Complex Tiling Higher Frequency X Higher Frequency Y Terms Can Still be Ignored

Complex Wood Texture Most Detail in X Far Less in Y

Randomized Texture Image High X Detail High Y Detail Most Loss of Detail, But Noisy

Page 21: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Step #2A: Macro-block Discrete Cosine Transform

8x8 Pixel Block – Macro-block – SD NTSC 720x480 (90x60 Macro-blocks), 3:2 Aspect Ratio – HD 720 1280x720 (160x90 Macro-blocks), 16:9 AR – HD 1080 1920x1080 (240x135 Macro-blocks), 16:9 AR

Sam Siewert 21

Page 22: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Step #2B: Macro-block Quantization (Lossy)

Apply Weighting and Scaling 8x8 to DCT Produces Lots of Repeated Values (and Zeros) Compared to Original

Sam Siewert 22

Page 23: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Decode Process for #2A-B

Sam Siewert 23

Page 24: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

How Lossy is the Decode Macro-Block?

Sam Siewert 24

Page 25: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

OpenCV Macroblock DCT Example Same Cactus 320x240 with 80x80 DCT Macroblocks

Sam Siewert 25

DCT iDCT

Same Cactus 320x240 Again with 8x8 DCT Macroblocks

DCT iDCT

Page 26: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Mathematics for 2D DCT Frequency Variation on X and Y axes from top left to bottom right Straight-forward Algorithm Based on 2D Equation is O(n2) per dimension Like Cooley-Tukey for DFT, a DCT Algorithm that is O(n*log2(n)) has been formulated (Arai, Y.; Agui, T.; Nakajima, M. - Numerical Recipes: The Art of Scientific Computing (3rd ed.)) http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_code/dct2/dct2.c

Sam Siewert 26

http://en.wikipedia.org/wiki/File:Dctjpeg.png

Page 27: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Step #2C: Macro-block Run-Length and Huffman Encoding

Zig-Zag Run-Length Encoding to Exploit Repeated Data and Zeros found in H.O.T. of Quantized DCT

– 86, 1, 7, -5, -1, 0, 1, 0, 0, 2, -1, 1, 0, -1, 0 , 0, 0, 0, -1, 0, 0, …

Becomes:

Sam Siewert 27

Page 28: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Huffman Applied to RLE Data Huffman Tables for MPEG-2 Macro-Blocks Defined in 13818-2 (Lossless) Compression Based on Probability of Occurance Shannon’s Source Coding Theory: log2(P), P=probability of occurrence, Binary encoding of Symbols

Sam Siewert 28

Page 29: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Step #3: Group of Pictures Concept – Transmit Change-Only Data I-Frame Compressed Only Intra-Frame By Methods #2A-2C to Macro-Blocks I-Frame Can Be Decoded Alone P-Frame is Differences Only Over the GoP B-Frame is Differences Only Between Both I-Frame and Closest P-Frame Difference Data Can be Further Encoded with Lossless Methods Without Steps 2A-C, Specifically Quantization, and With High Motion Video, Could Blow-Up

Sam Siewert 29

Page 30: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Group of Pictures: High Level View

Sam Siewert 30

Page 31: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Overall MPEG YCrCb Compression Performance

Standard Definition 720x480x2 (675KB/frame) @ 30fps – Requires 20MB/sec (200 Mbps) Uncompressed – Typical MPEG-2 @ 3.75 Mbps, > 50x Compression – Typical MPEG-4 @ 1.5 Mbps, > 100x Compression – 10 to 20 Programs on QAM 256 (48Mbps, 6MhZ/Ch) – ≈10 MPEG-4 Programs on ATSC 8VSB (19.39 Mbps, 6MhZ/Ch)

HD 720p (1280x720x2,1800KB/frame) @ 30fps – Requires 53MB/sec (530Mbps) Uncompressed – Typical MPEG-2 @ 20 Mbps, > 25x Compression – Typical MPEG-4 @ 10 Mbps, > 50x Compression

HD 1080p (1920x1080x2, 4050KB/frame) @ 30fps – Requires 120MB/sec (1200Mbps) Uncompressed – Typical MPEG-2, VC-1 @ 45 Mbps, > 30x Compression – Typical MPEG-4 @ 20 Mbps, > 60x Compression

Sam Siewert 31

Page 32: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

13818-2 Defines Elementary/Program Streams

13818-2: Elementary Video Stream Encode/Decode – Defines Color Sub-Sampling Formats – 8x8 Macro-Block Encoding – Video DCT – Post DCT Macro-Block Quantization Weighting and Scaling

Coefficients – RLE Zig-Zag Macro-Block Sampling – Huffman Encoding Table – Group of Pictures:I-Frame, B-Frame, P-Frame – Presentation and Decode Time Stamps (PTS/DTS) – Order of Encode and Decode Operations

Not Suitable for Transport over Networks, but Sufficient for Local Playback (DVD, PC HDD, Flash-Memory Media)

Sam Siewert 32

Page 33: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

13818-1 13818-1: Transport Streams for Video & Audio – Container for Program Streams (188 Byte Packets) – Multiplexed Video and Audio Elementary Streams – PSI – Program Specific Information

PID – Program ID Guide Data Emergency Broadcast

– System Clock (PCR, PTS/DTS) – Sequence Headers – Resolution and Format Information, Bit-Rate

GoP Header, Frame Header Slices of Macro-Blocks for Resolution Decoder Information (Color, Quantization Tables)

– Can Be Multiple Programs or Combined Audio and Video as a Program

MPEG-2 Video Elementary Stream AC-3 Audio Elementary Stream Secondary Audio Stream (Different Language) Up to 10 or More Audio+Video in One Transport Stream for Virtual Channels

Sam Siewert 33

Page 34: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

Parsing an Elementary Video Stream

Sam Siewert 34

Many 188-Byte Packet Types and Header Allows for Multi-plexing of many Video and Audio Streams on a Carrier

Page 35: Digital Media and Interactive Systemsecee.colorado.edu/~siewerts/extra/DMIS-lectures/Lecture-MPEG-I...bits per pixel vs. 24 bits per pixel (33% smaller frame size) … 0 . 319 …

MPEG-4 vs. MPEG-2 MPEG-2 – Defined by ISO 13818-1, 13818-2 – Leverages MPEG-1 (Motion Picture Experts Group – 1988) – Widely Used for Digital Video – Digital Cable TV, DVD – Transport Stream designed for Broadcast (Lossy, No Beginning or End of

Stream) ATSC – Advanced Television Systems Committee (HDTV Broadcast)

– 8VSB Modulation – 8 level Vestigal Sideband Modulation, 6MhZ channel, 19.39 Mbps, Reed-Solomon Error Correction

– Up to 1080p (1920x1080) Video Resolution – AC-3 (Dolby) Audio

DVB – Digital Video Broadcast (Europe, Satellite) – Program Stream designed for Playback Media (DVD, Flash, HDD, etc.)

MPEG-4 – Defined by ISO 14496 (1998) – Leverages MPEG-2 Standards for Program/Transport, Encode/Decode – Better Compression Rates (improved motion prediction for P,B frames),

MPEG-4 Part-10 (H.264), e.g. Blu-Ray – Extensions for Digital Rights Management – Advanced Audio Encoding – Becoming More Widely Deployed for HD and Because of Lower Bit-Rate

Transport Streams

Sam Siewert 35