2d & 3d video processing for immersive applications emerging convergence of video, vision &...
TRANSCRIPT
2D & 3D VIDEO PROCESSING FOR IMMERSIVE APPLICATIONS
Emerging Convergence of Video, Vision & Graphics
Harpreet S. Sawhney
Rakesh Kumar
ACKNOWLEDGEMENTS
Collaborative Work with:
Hai Tao
Yanlin Guo
Steve Hsu
Supun Samarasekera
Keith Hanna
Aydin Arpa
Rick Wildes
TECHNICAL SUCCESS OF CONVERGENCE TECHNOLOGIES
PC based near real-time mosaicing
Automated Video Enhancement: VHS-to-DVD
Iris recognition, active vision
Image based modeling for Entertainment
Real-time Video Insertion
Immersive and Interactive Telepresence Modes of Operation
Observation Mode Interaction ModeConversation ModeUser observes a remote sitefrom any perspective.
User “walks” through site to view activities of interest“up close”.
Example: security, facility guards, sports & entertainment
Users talk and observe oneanother as if in the same room.
Users walk around yet maintaineye contact.
Example: immersive tele-conferencing
Remote users share a commonwork space.
Users observe each other’s handsas they manipulate shared objects, such as war room wall displays.
Example: mission planning, remote surgery
Quality of Service for Tele-presence
Critical Issues • High quality for immersive experience
– Artifact free recovery of 3D shape from video streams
– Efficient 3D video representation and compression
– High quality rendering of new views using 3D shape and video streams
– Bandwidth available in the Next Generation Internet
• Low latency for interactive applications– Real time 3D geometry recovery at the content server end
– Real time new view rendering at the browser client end
– Adaptive Stream management to handle user requests and network loads
– Error resilience and concealment to fill in missing packets
Convergence Technologies
… for immersive & interactive visual applications ... • Vision algorithms: High-quality 3D shape recovery
and dynamic scene analysis
• ASICs, high performance hardware: Real-time video processing
• Compact, low-cost cameras: CMOS cameras
• Low latency and high quality compression: Error resilience
• Real time view synthesis : Standard platforms, e.g. PCs
• Immersive Displays
Vision algorithm performance over time
2D Video Insertion
Coarse 3D Depth Recovery
Video registration to 3D site models
2D Stabilization
Alg
orith
m C
ompl
exity
1990
1993
1995
1998
Mosaicing for entertainment & surveillance
Real-time insertion inLive TV
Face Finding for Iris Recognition
Geo-registration visual databases
Time
High Quality 3d shape extraction
2000
ImmersiveTelepresence
HW Performance/Size/Cost over time
• Sarnoff ACADIA ASIC performance • 100 MHz system clock, processes 100 million pixels/sec in each processing element• 10 billion operations / sec total IC performance• 800 MB/sec SDRAM interface using 64-bit bus
• Enables building smart 3D cameras for immersive applications.
VFE-1001992
VFE-2001997
ACADIA ASIC2000
Application Performance
• Parametric Motion : Stabilization & Mosaicing– 720x240 fields @ 60 Hz OR 720x480 frames @ 30 Hz
• Pyramid based Fusion : Dynamic Range, Focus Enhancement– 720x240 fields @ 60 Hz OR 720x480 frames @ 30 Hz
• Stereo Depth Extraction– 720x240 field 32 disparity levels in 4 ms (250 Hz)
– 720x240 field 60 disparity levels in 10 ms (100 Hz)
– 60 disparities on 1k x 1k images at 55 ms (18 Hz)
Sarnoff Compression Technology … Required algorithm components for tele-presence are emerging ...
MPEG4, Progressive Encoding
VideoPhone: H.263
Low Latency MPEG2 multiplexing service
MPEG2: Encoding and Transmission
Alg
orith
m C
ompl
exity
1993- 1996
1998-1999
1997-1998
1999
ICTV
Time
DIREC-TV & HDTV
LG Electronics
E-vue
Just Noticeable Difference (JND):MPEG2 Encoding and QualityMeasurement
1997-1998
Tektronix
Pyramid & Wavelet based Encoding
1988-1993Still Image Compression
A FRAMEWORK FOR VIDEO PROCESSING
ALIGN
2D & 3D MODELS OF MOTION & STRUCTURE
MODEL-BASED IMAGE SEQUENCE ALIGNMENT
TEST
WARP/RENDER WITH 2D/3D MODELS
TEST ALIGNMENT QUALITY
SYNTHESIZE
CREATE OUTPUT REPRESENTATIONS
Core Vision Algorithmsfor (Real-time)
Motion & 3D Video Analysis
2D Immersive & Layered Representations
Stereo & Video Sequence Enhancement Multi-camera Immersive
Dynamic Rendering
Model-centric Video Visualization
Highlights of Sarnoff’s Video Analysis Technologies
… framework applied to a create immersive representations ...
Spherical MosaicsDynamic & Synopsis Mosaics
Hi-Q IBR based mixed resolution synthesisVideo Quality Enhancement for efficient compression
Dynamic model & video visualizationGeo-registration with reference image database
Hi-Q Depth extractionImage-based rendering with dynamic depth
SPHERICAL MOSAICS
Sarnoff Library VideoCaptures almost the complete sphere
with 380 frames
TOPOLOGY INFERENCE & LOCAL-TO-GLOBAL ALIGNMENT
[Sawhney,Hsu,Kumar ECCV98, Szeliski,Shum SIGGRAPH98]
SPHERICAL TOPOLOGY EVOLUTION
SPHERICAL MOSAICSarnoff Library
ACTIVE FOCUS OF ATTENTION WFOV/NFOV CONTROL
DYNAMIC MOSAICS
Video Stream with deleted moving objectOriginal Video
Dynamic Mosaic Video
SYNOPISIS MOSAICS
Low-Res Left
Synthesized High-Res Left
Original High-Res Right
ALIGNMENT & SYNTHESIS FOR HI-RES STEREO SYNTHESISA HIGH END APPLICATION OF IBMR
[Sawhney,Guo,Hanna,Kumar,Zhou,Adkins SIGGRAPH2001]
THE PROBLEM SCENARIOINPUT OUTPUT
Left Eye(Typically 1.5K)
Right Eye(Typically 6K)
3D & Motion Alignment Based Stereo Sequence Processing
t
t-1
t-2
t+1
t+2
Left Right
s t e r e of l
ow
f
ff
l
ll
o
o
o
w
w
w
Right
t-1
t
t+1
t+2
t+3
Left
• Highlights : – Scintillation effect is reduced.– Occlusion regions are better handled.
s t e r e of l
ow
f
ff
l
ll
o
o
o
w
w
w
SYNTHESIS RESULT ON REAL FOOTAGE
IMPLICATIONS FOR IMMERSIVE IBMR CAMERA CONFIGURATIONS
Lo-res camera
Hi-res camera
Multi-resolution camera configuration allows 3D capture at the highest resolutionas well as user-controlled large range of zooms without the need for
zoom control on the cameras.
Model-Centric Video VisualizationOR
Video-Centric Model Visualization [Hsu,Supun,Kumar,Sawhney CVPR00]
Original Video
Re-projection of video after merging with model.
Geo-registration of video to site model
Site model
Video to Site Model Alignment
• Model to frame alignment
REFINE
Correspondence-lessexterior orientationfrom 3D-2D line pairs
Oriented Energy Pyramid
0° 45°
90° 135°
• Goal: representation which indicates edge strength in the image at various orientations and scales
• Orientation selectivity: reduce false matches
• Coarse-to-fine: increase capture range
This will be an animation ofthe gradual improvement of alignment
during the coarse to fineiterations
regsite_animation.avi
Pose Refinement Algorithm…iterative coarse to fine adjustment of pose ...
Geo-Registration Video to Reference Database Alignment
[Wildes et al. ICCV01]
Current Video 3D Reference Imagery
Registration : Radical Appearance Changes
Dynamic 3D Capture & Rendering…global modeling is not feasible...
• Recovering depth from local views• Depth refinement across multiple local views • New view synthesis using multiple local views
Cross view depth checking
3D Shape/Depth Estimation from Multiple Views of a Scene
Stereo Pair
• Estimation of high quality, artifact free depth maps co-registered with video imagery for rendering new views.• Must work both outdoors and indoors
Multi-baseline depth estimation - requirements
Depth maps
New view rendering
A traditional stereo algorithm Global matching method
Thinstructures
Accurate boundaries
Accurate boundaries
[Tao,Sawhney,Kumar WACV00, ICCV01]
New view rendering using local depth estimation
Color segmentation based stereo algorithm(2000)
Multi-window plane+ parallax algorithm(1998)
Local flow estim-ation(1992)
New view rendering
Main ideas
• Motivations– be able to handle textureless regions– handle object boundaries accurately– global visibility constraints should be enforced– Hypothesize reasonable depths for unmatched regions
• Solutions– Global matching method - an analysis-by-synthesis approach– Representation - smooth depth representation in homogeneous region– Search method - neighborhood depth hypotheses generation– Efficient algorithm - incremental warping– Scene constraints - prior functions
Color Segmentation
Original image (frame 12) Original image (left)
Color segmentation [Comanicius 97]
New view rendering using local depth estimation
Color segmentation based stereo algorithm
True depthLeft image
new view rendering
Depth computation from 3 views
Depth map (frame 12)
Video frame 11 Video frame 12 Video frame 13
Color segmentation (frame 12)
Multiple View Depth Recovery and New View Rendering
New view rendering from multiple views.
New view rendering from a single view. left: from frame 212, right: from frame 215
Multiple view depth recovery and new view rendering
Original 14 video frames (frame 04-17)
Depth map of frame 12 and 15
New view rendering (71 frames)
Immersive Visualization of a Dynamic Event
• Temporally consistent motion and 3D shape extraction• Scintillation free dynamic high-quality rendering
AN IMMERSIVE IBMR GRAND CHALLENGE
AND IF WE DO IT RIGHT