next-generation virtual and augmented reality exploration and ... · mpeg exploration and...

Next-generation virtual and augmented reality exploration

and standardization

Prof. Gauthier Lafruit, [email protected]

1

mailto:[email protected]

MPEG: from Single- to Multi-View Compression

2

Single View:

HEVC=

High EfficiencyVideo Codec

Compression of 2 orders of magnitudee.g. 2h movie on DVD, Bluray

10 Mb/s

Stereo & Multi-View:

MV-HEVC=

MultiViewHigh Efficiency

Video Codec100 Mb/s

3D-HEVC (Feb. 2015)=

High EfficiencyVideo Codec

+ Depth

MPEG Exploration and Standardization

3

Standard = Have the same electrical and data format

CfEEE Draft CfP CfP ... FDIS IS

FN-SMV PCC

JPEG-PLENO (Point Cloud, Light Fields, Holography)Grand Challenge ICME

360 parallax VR

2-3 years

360 VR

• MPEG-Systems:• OMAF (Omni-Directional Application Format) = basic VR

• FTV (Free Viewpoint TV): • FN (Free Navigation – DIBR) and SMV (SuperMultiView)• CfE (Call for Evidence) on FN and SMV finalized• EE (Exploration Experiments) 360 VR

• JAhG JPEG-PLENO + MPEG-LightFields (VR)• Output document N16352:

similarities between different formats• MPEG-3DG (3D Graphics)

• Draft CfP (Call for Proposals) Point Clouds (PCC)

Free Navigation: From Multi-Viewpoint …

4

Free Navigation: … to Any Viewpoint to look at

5aka Virtual Reality (VR)

Multi-Camera Free Navigation TV

© NHK

Holoportation

7Holoportation © Microsoft Hololens

6-DoF

MPEG-VR roadmap

8

3-DoF

Single panoramic texture[ref8]

Left and Right panoramic textures [ref1]Light Fields [ref6]

Short-term standardization Longer term exploration

Single Panoramic Texture

9

Panoramic Texture

10

Fisheye lens: no stitching errors, relative low resolution [ref5] Multi-camera stitching: high resolution

3-DoF

Warp the images to stitch

11

[ref4]

Stitching parallax errors

12

Proponent’s result

http://web.cecs.pdx.edu/~fliu/papers/cvpr2014-stitching.pdf

[ref2]


Depth-based Stitching corrections

13

Google Jump Assembler[ref7]

[ref3]

Stereoscopic Panoramic Textures

14

Stereoscopic Panoramic Textures

15

3-DoF

Left and Right panoramic textures

Why not using light fields stored in a single texture?

[ref1]

Omni-Directional Texture

16[ref1]

Collect the right light ray for the given view

17

ODS = Omni-Directional Stereo

View interpolation (on the circel w/o occlusions)

corresponds to selecting the corresponding light

rays of the light field

[ref6]

Relation with Light Field Cameras

18Focus a posteriori in SW

© Lytro

Each Elemental Image contains directional

light information

(s, t)

(u, v)F

F’

Q

Q’

Planes on which to render

Light Fie

ld re

pre

sen

tation

Light rays emanating from the objects

Multi-Cameras: Discrete or In-a-Box

19

RayTrix

Discrete cameras

Microlens array

https://www.youtube.com/watch?v=p2w1DNkITI8

Dynamically Reparameterized Light Fields

https://www.youtube.com/watch?v=p2w1DNkITI8

objectL R

MM = 0.5 L + 0.5 R

M’ = 0.2 L + 0.8 R

L

R

Dep

th-b

ased

co

rrec

tio

n

object

(A) (B)

From all light rays that come from the object, select the ones (in red) corresponding to the camera view

Discretization of the light rays (over two parallel planes) requires (depth based) interpolation: M’ is closer to R than to L, hence cannot be the midpoint M

cameracamera

Light rays

Lumigraph: Depth needed in Sparse Light Field

6-DoF Free Navigation VR

21

3D model VR

23

6-DoF

3D point cloud VR

24

6-DoF

30 LIDAR positions © LISA-ULB

Laser Time-of-Flight

Image-Based VR

25

6-DoF

http://www.tobiasgurdan.de/research/

http://www.tobiasgurdan.de/research/

q

f

(x, y, z)t

s

yx

z

uv(s, t)

(u,v)

BRDF Omni-directional colors

Point Clouds and Image-Based Light Fields are theoretically equivalent

http://mpeg.chiariglione.org/sites/default/files/files/standards/parts/docs/w16352_2016-06-03_Report_JAhG_light-sound_fields.pdf

Many camera views

Laser Time-of-Flight Point Cloud acquisition

http://mpeg.chiariglione.org/sites/default/files/files/standards/parts/docs/w16352_2016-06-03_Report_JAhG_light-sound_fields.pdf

qf

(x, y, z)t

s

yx

z

uv(s, t)

(u,v)Camera 2

(q,f) extrinsics

BRDF Omni-directional colors

Renderer colors pixels with texture information

Category A:Scene geometry is given and Light transport is simulated for rendering

Category B:Scene geometry estimated

from Captured Light

(px, py)

Simulate Point Cloud light transport vs. Use directly the Light Field emanating from the points

Which format? Multi-View, Point Clouds, …?

28

Multi-CamInput

Point cloud

3D Mesh

3D Object

© Microsoft

https://www.youtube.com/watch?v=kZ-XZIV-o8s

https://www.youtube.com/watch?v=kZ-XZIV-o8s

3D Graphics Artefacts

29© 8i.com © Microsoft

Image-Based vs. 3D Graphics

34

Image-Based vs. 3D graphics Point Cloud

35

http://nozon.com/presenz © Nozon https://vimeo.com/49921117 © TimeSlice

Start from 3D gfx and create images Start from images and create 3D gfx objects

http://nozon.com/presenz

https://vimeo.com/49921117

Image-Based vs. 3D graphics Point Cloud

36

http://replay-technologies.com/ © Replay Technologies https://www.youtube.com/watch?v=sw_LI8J-AlU © Holografika

Start from images to create point clouds Start from a couple of image views and interpolate

http://replay-technologies.com/

https://www.youtube.com/watch?v=sw_LI8J-AlU

Light Field displays = VR without goggles

37© Holografika

© NICT

© NICT

3D Light Field Interpolation for all-around viewing

38

Inte

rpo

lati

on

Inp

ut

https://vimeo.com/128641902 © ACM Siggraph

https://vimeo.com/128641902

Challenges

• Image-Based vs. 3D gfx (Point Clouds, Meshes, etc)

• Acquisition/pre-processing cost

• Rendering cost

• Transmission cost

• Performance metric = Quality vs. Bitrate

• Quality = low latency, no visual artifacts, …

• Quality ≠ PSNR

39

0

1

2

3

4

5

6

7

8

9

10

1 000 2 000 3 000 4 000 5 000 6 000

MO

S

Bitrate (kbps)

BBB flowers

Anchor

NICT

20,5% average BD-rate bitrate reduction

New

Qu

alit

y

MPEG-Light Fields & JPEG-PLENO

• Light Fields = multi-camera acquisition (discrete and/or in-a-box)

• Point Clouds

• JPEG and MPEG will soon have Calls for Proposals (CfP)

40

LytroImmerge

RayTrix

Existing MPEG technology for 6-DoF DIBR/LightField-VR

MPEG-FTV (=Free Viewpoint TV)

SMV = Super-MultiView = Light Field displays

FN = Free Navigation

41

Bitrate reductions for the same quality

42

Bjøntegaard delta, versus anchors

(Decoded view PSNR / Decoded video bitrate)

Sequence UHasselt* Poznan Zheijang

Big Buck Bunnyflowers

- -5.21% -4.62%

PoznanBlocks - -16.25% -9.77%

SoccerArc - -1.14% -11.93%

SoccerLinear -31,6% +2.20% -0.18%

Average (nonlinear) - -7.53% -8.77%

Average (all) - -5.10% -6.62%

Bjøntegaard delta, versus anchors(MOS / Decoded video bitrate)

Sequence UHasselt Poznan Zheijang

Big Buck Bunnyflowers

- -43.3% -19.2%

Poznan Blocks - -36.9% -4.9%

Soccer Arc - -70.1% -69.6%

Soccer Linear 2 -46.51% -28.5% 27.0%

Average (nonlinear) - -50.1% -31.3%

Average (all) - -44.7% -16.7%

PSNR metric MOS metric

Vittorio Baroncini, Masayuki Tanimoto, Olgierd Stankiewicz, “Summary of the results of the Call for Evidence on Free-Viewpoint Television: Super-Multiview and Free Navigation”, ISO/IEC JTC1/SC29/WG11 MPEG2016/N16318, June 2016, Geneva, Zwitserland

* Depth map generated after decoding

3D display View Synthesis (MPEG-FTV, DERS/VSRS)

[Nagoya University 80-cams input]

20

25

30

35

40

45

59 69 79 89 99 109 119

VSRS-Extended 3-views VSRS-Extended 5-views VSRS-Extended 9-views

VSRS-Extended 17-views VSRS-Extended 33-views [m35079]

PSN

R (

dB

)View no.

Interpolated Transmitted

-13 dB

3D graphics synthetic sequences

44

6-DoF Image-Based Transmission

45

80 cams

Oculus RiftStereo Images

Unity WebGL tool

3D content

© Universidad Politécnica de Madrid

View Synthesis of missing camera views:Depth Image Based Rendering (DIBR)

46

Far away object

Closeby object

Disparity DDisparity = 1/Depth

D/2

Mid-way view

Disoccluded regions have to be inpaintedfrom other available views

Left cam Right cam

Display one out of 7 camera inputs Perform View Synthesis (VSRS) on 7 camera inputs

View Synthesis Reference Software (VSRS)

47

Sweep around the display

13 views transmitted. For each, 5 additional synthesized views.

48

Per pixel:13 rays transmitted (16%)67 rays are synthesized (84%)

© NICT

Sweep around the display

View 59 = input View 66 = input

View 63 = synthesized

33

35

37

39

41

0 10 20 30 40 50 60 70 80Y-PS

NR

(dB)

View #

R1

rho=0.001rho=0.01Anchor

Anchor

Proposed

D=4dB

© NICT

PSNR ≈ (original image) – (decoded & view synthesized image)

49

Real natural sequences

51

Estimate the depth map

52

Depth Image-Based Rendering

54

Camera 0 viewLeft re-projection Right re-projection

Camera 0 Left re-projectionHole filling from Camera 1 re-projection

Blended

Camera View Virtual View

Point Cloud

Gaussian Splat rendering

Reprojection of a view in VSRS = a point cloud

1 view re-projected 3 views re-projected

VSRS = View Synthesis Reference Software (in MPEG)

Point Clouds (left) vs. VSRS 4.1 (right)

7 views re-projected + Gaussian splats in reverse warping 3 views re-projected + VSRS

Media Frameworkhttps://github.com/timlenertz/mf_view_syn VSRS 4.1 modified (m38480)

plugged into the Media Framework

https://github.com/timlenertz/mf_view_syn

Modular parallelization framework for multi-stream video processing

submitted to ACMMM 2016

Better Depth Map yields better View Synthesis

Fencing

15⁰

2⁰ 1m

Everything is related to the quality of the depth map [ref9] !!!!

Light Field Depth Estimation

Epipolar Plane Image (EPI)

OUTPUT: View Synthesis with 10 skipped cameras

INPUT: PBRT photo-realistic rendering

Cam

era

s

20

25

30

35

40

45

59 79 99 119

[Univ. Hasselt]

Light Fields vs. DERS/VSRS

DERS = MPEG Depth Estimation Reference SoftwareVSRS = MPEG View Synthesis Reference SoftwareLF = Light Fields

Epipolar Plane Image (EPI)

OUTPUT: View Synthesis with 10 skipped cameras

INPUT: PBRT photo-realistic rendering

Cam

era

s

20

25

30

35

40

45

59 79 99 119

[Univ. Hasselt]

Light Field Depth Estimation

Light Fields vs. DERS/VSRS

DERS = MPEG Depth Estimation Reference SoftwareVSRS = MPEG View Synthesis Reference SoftwareLF = Light Fields

Conclusion: Cinematic 6-DoF VR

66

Standardization CfP and Exploration:• Single panoramic stitched texture• Stereoscopic stitched textures• ODS to Light Fields approachExisting technology:• 3D-HEVC with VSRS view

synthesis/interpolationNovel technology:• Light Fields multi-cameras: discrete

and/or in-a-box

References

67

[ref1] Paul Bourke, “Synthetic stereoscopic panoramic images”, http://paulbourke.net/papers/vsmm2006/vsmm2006.pdf

[ref2] Fan Zhang and Feng Liu, “Parallax-tolerant Image Stitching”, http://web.cecs.pdx.edu/~fliu/papers/cvpr2014-stitching.pdf

[ref3] O. Tugrul Turan and Christopher Higgins, “ENHANCEMENTS FOR DIGITAL IMAGING OF GUSSETPLATE CONNECTIONS: FISHEYE AND IMAGE STITCHING”, report SPR 304-581, http://www.oregon.gov/ODOT/TD/TP_RES/docs/Reports/2011/FishStitch_SPR304_581.pdf

[ref4] Sing Bing Kang, Richard Szeliski, and Matthew Uyttendaele, “Seamless Stitching using Multi-Perspective Plane Sweep”, technical report MSR-TR-2004-48, June 2004 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2004-48.pdf

[ref5] https://opticalflow.wordpress.com/2016/05/11/correcting-360-degree-stereo-video-capture-part-1-of-3/

[ref6] https://support.google.com/jump/answer/6399843?hl=en

[ref7] http://www.roadtovr.com/google-announces-jump-an-open-vr-camera-design-with-stitching-solution-and-youtube-playback/

[ref 8] S Heymann, A Smolic, K Mueller, Y Guo, J Rurainsky, P Eisert, T Wiegand, “REPRESENTATION, CODING AND INTERACTIVE RENDERING OF HIGH RESOLUTION PANORAMIC IMAGES AND VIDEO USING MPEG-4”, Proc. Panoramic Photogrammetry Workshop (PPW) http://www.isprs.org/proceedings/xxxvi/5-w8/paper/PanoWS_Berlin2005_Heymann.pdf

http://paulbourke.net/papers/vsmm2006/vsmm2006.pdf


http://www.oregon.gov/ODOT/TD/TP_RES/docs/Reports/2011/FishStitch_SPR304_581.pdf

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2004-48.pdf

https://opticalflow.wordpress.com/2016/05/11/correcting-360-degree-stereo-video-capture-part-1-of-3/

https://support.google.com/jump/answer/6399843?hl=en

http://www.roadtovr.com/google-announces-jump-an-open-vr-camera-design-with-stitching-solution-and-youtube-playback/

http://www.isprs.org/proceedings/xxxvi/5-w8/paper/PanoWS_Berlin2005_Heymann.pdf

next-generation virtual and augmented reality exploration and ... · mpeg exploration and...

Documents