a picture is worth a thousand words milton chen. what’s a picture worth? a thousand words -...
Post on 21-Dec-2015
236 views
TRANSCRIPT
![Page 1: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/1.jpg)
A Picture is Worth a Thousand Words
Milton Chen
![Page 2: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/2.jpg)
What’s a Picture Worth?
• A thousand words - Descartes (1596-1650)
• A thousand bytes - modern translation– 1000 * 5 * 5 / 3 8,000 bits
• 75,000 bytes - ATSC/MPEG-2– 20 M / 30 600,000 bits
![Page 3: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/3.jpg)
Frequency Response of the Eye
• Lens - low pass
• Photoreceptors - low pass
• Lateral inhibition - high pass– edge is important
![Page 4: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/4.jpg)
Today’s Video Coding
YUV(lossy)
Motion DCTQuantize(lossy)
EntropyOrder
Designed for natural scenes =>Higher frequency DCT coefficients are quantized more =>Sharp edges are not well preserved
![Page 5: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/5.jpg)
What’s Wrong with Today’s Video Coding
• Poor performance for – text (channel logo, stock ticks)– graphics – anything with sharp edges
![Page 6: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/6.jpg)
Desirable Features
• Postproduction support
• Personalized delivery / presentation
• Interactive
• Error resilience
• More compression
• Facilitate search / indexing (MPEG-7)
![Page 7: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/7.jpg)
Outline
• Why
• MPEG-4 Overview
• Systems Layer
• Visual Coding– Arbitrarily shaped video– Meshed video– Face and body
![Page 8: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/8.jpg)
Goals of MPEG-4
• One content– convergence of DTV, computer graphics, and
WWW– broadcast, internet, local
• User interactivity
• Higher compression rates
• Robustness in mobile environment
![Page 9: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/9.jpg)
MPEG-4 Applications
• Interactive TV (broadcast)– Home-shopping, Interactive game show
• Virtual workspace (internet)– virtual meeting, collaborative design
• Infotainment (local)– Virtual-City-Guide
![Page 10: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/10.jpg)
MPEG-4 Key Concepts
• Independent coding of objects– allow user interactivity (client & server)– higher compression rates
• Provide tools as well as solutions– allow content specific and user defined
compression algorithms
![Page 11: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/11.jpg)
MPEG-4 History
• Started in July 1993
• Originally for low-bit-rate applications
• Version 1 to be standardized by January 1999
• Continue work on version 2, etc.
![Page 12: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/12.jpg)
MPEG-4 Standard
1) Systems (manage streams, composition)
2) Visual (natural and synthetic)
3) Audio (natural and synthetic)
4) Conformance Testing
5) Reference Software
6) Delivery Multimedia Integration Framework (medium abstraction layer)
![Page 13: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/13.jpg)
hierarchically multiplexeddownstream control / data
hierarchically multiplexedupstream control / data
audiovisualpresentation
3D objects
2D background
voice
sprite
hypothetical viewer
projection
videocompositor
plane
audiocompositor
scenecoordinate
systemx
y
z user events
audiovisual objects
speakerdisplay
user input
![Page 14: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/14.jpg)
![Page 15: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/15.jpg)
TransMux Streams
FlexMux Streams
Audiovisual InteractiveScene
AL-Packetized Streams
Elementary Streams
Composition and Rendering
Display andUser
Interaction
Transmission/Storage Medium
...(RTP)UDP
IP
(PES)MPEG-2
TS
AAL2ATM
H223PSTN
DABMux ...
TransMuxLayer
TransMux Interface
FlexMux FlexMux FlexMux FlexMux FlexMuxLayer
Stream Multiplex Interface
AL AL...AL AL ... AL AccessUnitLayer
Elementary Stream Interface
PrimitiveAV Objects
SceneDescriptionInformation
ObjectDescriptor
... CompressionLayer
ReturnChannelCoding
![Page 16: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/16.jpg)
Previous Work in Object Coding• Synthetic High System (Schreiber ‘59)
• Contour-Texture Approach (Kocher & Kunt ‘82)
• Object-Based Video Coder (Musmann et. al. ‘89)
• Talisman (Torborg & Kajiya ‘96)
• Blue screen matting (Vlahos ‘64)
![Page 17: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/17.jpg)
Shape Coding• Bitmap-based
– 1 means in, 0 means out– Chroma-keying, GIF89a– G4 fax standard
• Contour-based– chain code– polygon/curve approximation– Fourier descriptor
![Page 18: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/18.jpg)
Chain Code
• Follows the contour and encode the direction of next boundary pel
• 4 or 8 directions for an avg. of 1.2 or 1.4 bits per boundary pel
• Extensions– length– angular resolution
![Page 19: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/19.jpg)
Polygon Approximation
• Add control points until maximum error is below threshold
• Threshold <= 1.4 pel for CIF (352*288) video
• Extension– curves of various order
![Page 20: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/20.jpg)
Fourier Descriptor
• Translation, rotation, and scale invariant
• Sample contour -> ( xi, yi )
• i, ( yi+1 - yi ) / ( xi + 1 - xi )
• Compute Fourier Series coefficients
• Good for recognition, but not an efficient shape coder
![Page 21: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/21.jpg)
MPEG-4 Experiments• Chroma-keying
– color bleeding– need to decode whole frame to get shape
• Bitmap and contour-based coding are similar in:– error resilience– coding efficiency
• Bitmap-based is simpler for hardware due to regular memory access
![Page 22: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/22.jpg)
MPEG-4 Shape Coding
• Three types of macroblocks– transparent, opaque, and object boundary
• Context-based arithmetic encoder • Macroblocks can be subsampled• Texture padded with 0 or mean value• Transparency
– constant: one 8 bit value– arbitrary: treat it like color
![Page 23: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/23.jpg)
Meshed Video
• 2D mesh tessellates the video into patches
• Motion vector for each vertex
• Texture warped in each patch
![Page 24: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/24.jpg)
Meshed Video - Motivation
• Motion Modeling– Translational-block motion does not model
rotation, scaling, reflection, and shear
• Shape Modeling– Possible without depth
![Page 25: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/25.jpg)
Meshed Video - Applications• Compression
– better motion compensation– transmit texture only at key frames– spatio-temporal interpolation (zooming, frame-rate
up-conversion)
• Manipulation– augmented reality– transfiguration (replace billboards)
• Indexing / searching
![Page 26: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/26.jpg)
Face• Face object
– Default face model with terminal– Facial Definition Parameter or user supplied
model/texture– Facial Animation Parameter plus Amplification
and Filters– Lip Shape Animation from phoneme
![Page 27: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/27.jpg)
Facial Definition Parameter
4.64.4
10.4
10.2
9.4
2.10
Y
Z
X
7.1
2.12.10
2.1
9.2
5.2 5.1
4.34.2 4.14.4
10.6
10.10
10.8
11.311.2
11.511.5
11.411.4
11.2
11.1
11.1
10.10
10.8
10.6
10.9
10.7
10.5
10.3
10.110.2
3.11
3.13
3.7
3.9
3.53.1
3.3
Left Eye
Other feature points
Feature points affected by FAPs6.1
6.3
6.4
Tongue
6.2
Mouth
8.4
8.7
8.5
2.4 8.3
8.1
2.5
2.8
2.6.2.2
2.9
2.7
2.3
8.108.6 8.9
8.8 8.2
3.14
3.12
3.10
3.8
3.63.2
Right Eye
4.6 4.5
9.119.10
9.9
9.8
Teeth
9.12
2.112.12
9.6
2.132.14 2.14
2.12
9.14
Nose
9.79.6
9.12
9.19.29.3
9.59.4
9.14 9.13
9.15
Y
X
Z
3.4
10.4
9.3
![Page 28: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/28.jpg)
Facial Animation Parameter
ES0
ENS0
MNS0
MW0
IRISD0
![Page 29: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/29.jpg)
Body
• Like the face
![Page 30: A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation](https://reader036.vdocuments.net/reader036/viewer/2022062300/56649d595503460f94a38787/html5/thumbnails/30.jpg)
Ultimate Compression TechniqueComputer Graphics ???
• Block based DCT (MPEG-1/2)
• Arbitrary shaped video (MPEG-4)
• Meshed video (MPEG-4)
• Image based rendering
• Textured 3D graphics
• Geometry only 3D graphics