2d & 3d video processing for immersive applications emerging convergence of video, vision &...

2D & 3D VIDEO PROCESSING FOR IMMERSIVE APPLICATIONS

Emerging Convergence of Video, Vision & Graphics

Harpreet S. Sawhney

Rakesh Kumar

ACKNOWLEDGEMENTS

Collaborative Work with:

Hai Tao

Yanlin Guo

Steve Hsu

Supun Samarasekera

Keith Hanna

Aydin Arpa

Rick Wildes

TECHNICAL SUCCESS OF CONVERGENCE TECHNOLOGIES

PC based near real-time mosaicing

Automated Video Enhancement: VHS-to-DVD

Iris recognition, active vision

Image based modeling for Entertainment

Real-time Video Insertion

Immersive and Interactive Telepresence Modes of Operation

Observation Mode Interaction ModeConversation ModeUser observes a remote sitefrom any perspective.

User “walks” through site to view activities of interest“up close”.

Example: security, facility guards, sports & entertainment

Users talk and observe oneanother as if in the same room.

Users walk around yet maintaineye contact.

Example: immersive tele-conferencing

Remote users share a commonwork space.

Users observe each other’s handsas they manipulate shared objects, such as war room wall displays.

Example: mission planning, remote surgery

Quality of Service for Tele-presence

Critical Issues • High quality for immersive experience

– Artifact free recovery of 3D shape from video streams

– Efficient 3D video representation and compression

– High quality rendering of new views using 3D shape and video streams

– Bandwidth available in the Next Generation Internet

• Low latency for interactive applications– Real time 3D geometry recovery at the content server end

– Real time new view rendering at the browser client end

– Adaptive Stream management to handle user requests and network loads

– Error resilience and concealment to fill in missing packets

Convergence Technologies

… for immersive & interactive visual applications ... • Vision algorithms: High-quality 3D shape recovery

and dynamic scene analysis

• ASICs, high performance hardware: Real-time video processing

• Compact, low-cost cameras: CMOS cameras

• Low latency and high quality compression: Error resilience

• Real time view synthesis : Standard platforms, e.g. PCs

• Immersive Displays

Vision algorithm performance over time

2D Video Insertion

Coarse 3D Depth Recovery

Video registration to 3D site models

2D Stabilization

Alg

orith

m C

ompl

exity

1990

1993

1995

1998

Mosaicing for entertainment & surveillance

Real-time insertion inLive TV

Face Finding for Iris Recognition

Geo-registration visual databases

Time

High Quality 3d shape extraction

2000

ImmersiveTelepresence

HW Performance/Size/Cost over time

• Sarnoff ACADIA ASIC performance • 100 MHz system clock, processes 100 million pixels/sec in each processing element• 10 billion operations / sec total IC performance• 800 MB/sec SDRAM interface using 64-bit bus

• Enables building smart 3D cameras for immersive applications.

VFE-1001992

VFE-2001997

ACADIA ASIC2000

Application Performance

• Parametric Motion : Stabilization & Mosaicing– 720x240 fields @ 60 Hz OR 720x480 frames @ 30 Hz

• Pyramid based Fusion : Dynamic Range, Focus Enhancement– 720x240 fields @ 60 Hz OR 720x480 frames @ 30 Hz

• Stereo Depth Extraction– 720x240 field 32 disparity levels in 4 ms (250 Hz)

– 720x240 field 60 disparity levels in 10 ms (100 Hz)

– 60 disparities on 1k x 1k images at 55 ms (18 Hz)

Sarnoff Compression Technology … Required algorithm components for tele-presence are emerging ...

MPEG4, Progressive Encoding

VideoPhone: H.263

Low Latency MPEG2 multiplexing service

MPEG2: Encoding and Transmission

Alg

orith

m C

ompl

exity

1993- 1996

1998-1999

1997-1998

1999

ICTV

Time

DIREC-TV & HDTV

LG Electronics

E-vue

Just Noticeable Difference (JND):MPEG2 Encoding and QualityMeasurement

1997-1998

Tektronix

Pyramid & Wavelet based Encoding

1988-1993Still Image Compression

A FRAMEWORK FOR VIDEO PROCESSING

ALIGN

2D & 3D MODELS OF MOTION & STRUCTURE

MODEL-BASED IMAGE SEQUENCE ALIGNMENT

TEST

WARP/RENDER WITH 2D/3D MODELS

TEST ALIGNMENT QUALITY

SYNTHESIZE

CREATE OUTPUT REPRESENTATIONS

Core Vision Algorithmsfor (Real-time)

Motion & 3D Video Analysis

2D Immersive & Layered Representations

Stereo & Video Sequence Enhancement Multi-camera Immersive

Dynamic Rendering

Model-centric Video Visualization

Highlights of Sarnoff’s Video Analysis Technologies

… framework applied to a create immersive representations ...

Spherical MosaicsDynamic & Synopsis Mosaics

Hi-Q IBR based mixed resolution synthesisVideo Quality Enhancement for efficient compression

Dynamic model & video visualizationGeo-registration with reference image database

Hi-Q Depth extractionImage-based rendering with dynamic depth

SPHERICAL MOSAICS

Sarnoff Library VideoCaptures almost the complete sphere

with 380 frames

TOPOLOGY INFERENCE & LOCAL-TO-GLOBAL ALIGNMENT

[Sawhney,Hsu,Kumar ECCV98, Szeliski,Shum SIGGRAPH98]

SPHERICAL TOPOLOGY EVOLUTION

SPHERICAL MOSAICSarnoff Library

ACTIVE FOCUS OF ATTENTION WFOV/NFOV CONTROL

DYNAMIC MOSAICS

Video Stream with deleted moving objectOriginal Video

Dynamic Mosaic Video

SYNOPISIS MOSAICS

Low-Res Left

Synthesized High-Res Left

Original High-Res Right

ALIGNMENT & SYNTHESIS FOR HI-RES STEREO SYNTHESISA HIGH END APPLICATION OF IBMR

[Sawhney,Guo,Hanna,Kumar,Zhou,Adkins SIGGRAPH2001]

THE PROBLEM SCENARIOINPUT OUTPUT

Left Eye(Typically 1.5K)

Right Eye(Typically 6K)

3D & Motion Alignment Based Stereo Sequence Processing

t

t-1

t-2

t+1

t+2

Left Right

s t e r e of l

ow

f

ff

l

ll

o

o

o

w

w

w

Right

t-1

t

t+1

t+2

t+3

Left

• Highlights : – Scintillation effect is reduced.– Occlusion regions are better handled.

s t e r e of l

ow

f

ff

l

ll

o

o

o

w

w

w

SYNTHESIS RESULT ON REAL FOOTAGE

IMPLICATIONS FOR IMMERSIVE IBMR CAMERA CONFIGURATIONS

Lo-res camera

Hi-res camera

Multi-resolution camera configuration allows 3D capture at the highest resolutionas well as user-controlled large range of zooms without the need for

zoom control on the cameras.

Model-Centric Video VisualizationOR

Video-Centric Model Visualization [Hsu,Supun,Kumar,Sawhney CVPR00]

Original Video

Re-projection of video after merging with model.

Geo-registration of video to site model

Site model

C:\NGI\Browser\_MView_.exe

Video to Site Model Alignment

• Model to frame alignment

REFINE

Correspondence-lessexterior orientationfrom 3D-2D line pairs

Oriented Energy Pyramid

0° 45°

90° 135°

• Goal: representation which indicates edge strength in the image at various orientations and scales

• Orientation selectivity: reduce false matches

• Coarse-to-fine: increase capture range

This will be an animation ofthe gradual improvement of alignment

during the coarse to fineiterations

regsite_animation.avi

Pose Refinement Algorithm…iterative coarse to fine adjustment of pose ...

Geo-Registration Video to Reference Database Alignment

[Wildes et al. ICCV01]

Current Video 3D Reference Imagery

Registration : Radical Appearance Changes

Dynamic 3D Capture & Rendering…global modeling is not feasible...

• Recovering depth from local views• Depth refinement across multiple local views • New view synthesis using multiple local views

Cross view depth checking

3D Shape/Depth Estimation from Multiple Views of a Scene

Stereo Pair

• Estimation of high quality, artifact free depth maps co-registered with video imagery for rendering new views.• Must work both outdoors and indoors

Multi-baseline depth estimation - requirements

Depth maps

New view rendering

A traditional stereo algorithm Global matching method

Thinstructures

Accurate boundaries

Accurate boundaries

[Tao,Sawhney,Kumar WACV00, ICCV01]

New view rendering using local depth estimation

Color segmentation based stereo algorithm(2000)

Multi-window plane+ parallax algorithm(1998)

Local flow estim-ation(1992)

New view rendering

Main ideas

• Motivations– be able to handle textureless regions– handle object boundaries accurately– global visibility constraints should be enforced– Hypothesize reasonable depths for unmatched regions

• Solutions– Global matching method - an analysis-by-synthesis approach– Representation - smooth depth representation in homogeneous region– Search method - neighborhood depth hypotheses generation– Efficient algorithm - incremental warping– Scene constraints - prior functions

Color Segmentation

Original image (frame 12) Original image (left)

Color segmentation [Comanicius 97]

New view rendering using local depth estimation

Color segmentation based stereo algorithm

True depthLeft image

new view rendering

Depth computation from 3 views

Depth map (frame 12)

Video frame 11 Video frame 12 Video frame 13

Color segmentation (frame 12)

Multiple View Depth Recovery and New View Rendering

New view rendering from multiple views.

New view rendering from a single view. left: from frame 212, right: from frame 215

Multiple view depth recovery and new view rendering

Original 14 video frames (frame 04-17)

Depth map of frame 12 and 15

New view rendering (71 frames)

Immersive Visualization of a Dynamic Event

• Temporally consistent motion and 3D shape extraction• Scintillation free dynamic high-quality rendering

C:\NGI\Browser\_MView_.exe

AN IMMERSIVE IBMR GRAND CHALLENGE

AND IF WE DO IT RIGHT

2d & 3d video processing for immersive applications emerging convergence of video, vision &...

Documents