next-generation virtual and augmented reality exploration and ... · mpeg exploration and...
TRANSCRIPT
Next-generation virtual and augmented reality exploration
and standardization
Prof. Gauthier Lafruit, [email protected]
1
MPEG: from Single- to Multi-View Compression
2
Single View:
HEVC=
High EfficiencyVideo Codec
Compression of 2 orders of magnitudee.g. 2h movie on DVD, Bluray
10 Mb/s
Stereo & Multi-View:
MV-HEVC=
MultiViewHigh Efficiency
Video Codec100 Mb/s
3D-HEVC (Feb. 2015)=
High EfficiencyVideo Codec
+ Depth
MPEG Exploration and Standardization
3
Standard = Have the same electrical and data format
CfEEE Draft CfP CfP ... FDIS IS
FN-SMV PCC
JPEG-PLENO (Point Cloud, Light Fields, Holography)Grand Challenge ICME
360 parallax VR
2-3 years
360 VR
• MPEG-Systems:• OMAF (Omni-Directional Application Format) = basic VR
• FTV (Free Viewpoint TV): • FN (Free Navigation – DIBR) and SMV (SuperMultiView)• CfE (Call for Evidence) on FN and SMV finalized• EE (Exploration Experiments) 360 VR
• JAhG JPEG-PLENO + MPEG-LightFields (VR)• Output document N16352:
similarities between different formats• MPEG-3DG (3D Graphics)
• Draft CfP (Call for Proposals) Point Clouds (PCC)
Free Navigation: From Multi-Viewpoint …
4
Free Navigation: … to Any Viewpoint to look at
5aka Virtual Reality (VR)
Multi-Camera Free Navigation TV
© NHK
Holoportation
7Holoportation © Microsoft Hololens
6-DoF
MPEG-VR roadmap
8
3-DoF
Single panoramic texture[ref8]
Left and Right panoramic textures [ref1]Light Fields [ref6]
Short-term standardization Longer term exploration
Single Panoramic Texture
9
Panoramic Texture
10
Fisheye lens: no stitching errors, relative low resolution [ref5] Multi-camera stitching: high resolution
3-DoF
Warp the images to stitch
11
[ref4]
Stitching parallax errors
12
Proponent’s result
http://web.cecs.pdx.edu/~fliu/papers/cvpr2014-stitching.pdf
[ref2]
Depth-based Stitching corrections
13
Google Jump Assembler[ref7]
[ref3]
Stereoscopic Panoramic Textures
14
Stereoscopic Panoramic Textures
15
3-DoF
Left and Right panoramic textures
Why not using light fields stored in a single texture?
[ref1]
Omni-Directional Texture
16[ref1]
Collect the right light ray for the given view
17
ODS = Omni-Directional Stereo
View interpolation (on the circel w/o occlusions)
corresponds to selecting the corresponding light
rays of the light field
[ref6]
Relation with Light Field Cameras
18Focus a posteriori in SW
© Lytro
Each Elemental Image contains directional
light information
(s, t)
(u, v)F
F’
Q
Q’
Planes on which to render
Light Fie
ld re
pre
sen
tation
Light rays emanating from the objects
Multi-Cameras: Discrete or In-a-Box
19
RayTrix
Discrete cameras
Microlens array
https://www.youtube.com/watch?v=p2w1DNkITI8
Dynamically Reparameterized Light Fields
objectL R
MM = 0.5 L + 0.5 R
M’ = 0.2 L + 0.8 R
L
R
Dep
th-b
ased
co
rrec
tio
n
object
(A) (B)
From all light rays that come from the object, select the ones (in red) corresponding to the camera view
Discretization of the light rays (over two parallel planes) requires (depth based) interpolation: M’ is closer to R than to L, hence cannot be the midpoint M
cameracamera
Light rays
Lumigraph: Depth needed in Sparse Light Field
6-DoF Free Navigation VR
21
3D model VR
23
6-DoF
3D point cloud VR
24
6-DoF
30 LIDAR positions © LISA-ULB
Laser Time-of-Flight
q
f
(x, y, z)t
s
yx
z
uv(s, t)
(u,v)
BRDF Omni-directional colors
Point Clouds and Image-Based Light Fields are theoretically equivalent
http://mpeg.chiariglione.org/sites/default/files/files/standards/parts/docs/w16352_2016-06-03_Report_JAhG_light-sound_fields.pdf
Many camera views
Laser Time-of-Flight Point Cloud acquisition
qf
(x, y, z)t
s
yx
z
uv(s, t)
(u,v)Camera 2
(q,f) extrinsics
BRDF Omni-directional colors
Renderer colors pixels with texture information
Category A:Scene geometry is given and Light transport is simulated for rendering
Category B:Scene geometry estimated
from Captured Light
(px, py)
Simulate Point Cloud light transport vs. Use directly the Light Field emanating from the points
Which format? Multi-View, Point Clouds, …?
28
Multi-CamInput
Point cloud
3D Mesh
3D Object
© Microsoft
https://www.youtube.com/watch?v=kZ-XZIV-o8s
3D Graphics Artefacts
29© 8i.com © Microsoft
Image-Based vs. 3D Graphics
34
Image-Based vs. 3D graphics Point Cloud
35
http://nozon.com/presenz © Nozon https://vimeo.com/49921117 © TimeSlice
Start from 3D gfx and create images Start from images and create 3D gfx objects
Image-Based vs. 3D graphics Point Cloud
36
http://replay-technologies.com/ © Replay Technologies https://www.youtube.com/watch?v=sw_LI8J-AlU © Holografika
Start from images to create point clouds Start from a couple of image views and interpolate
Light Field displays = VR without goggles
37© Holografika
© NICT
© NICT
3D Light Field Interpolation for all-around viewing
38
Inte
rpo
lati
on
Inp
ut
https://vimeo.com/128641902 © ACM Siggraph
Challenges
• Image-Based vs. 3D gfx (Point Clouds, Meshes, etc)
• Acquisition/pre-processing cost
• Rendering cost
• Transmission cost
• Performance metric = Quality vs. Bitrate
• Quality = low latency, no visual artifacts, …
• Quality ≠ PSNR
39
0
1
2
3
4
5
6
7
8
9
10
1 000 2 000 3 000 4 000 5 000 6 000
MO
S
Bitrate (kbps)
BBB flowers
Anchor
NICT
20,5% average BD-rate bitrate reduction
New
Qu
alit
y
MPEG-Light Fields & JPEG-PLENO
• Light Fields = multi-camera acquisition (discrete and/or in-a-box)
• Point Clouds
• JPEG and MPEG will soon have Calls for Proposals (CfP)
40
LytroImmerge
RayTrix
Existing MPEG technology for 6-DoF DIBR/LightField-VR
MPEG-FTV (=Free Viewpoint TV)
SMV = Super-MultiView = Light Field displays
FN = Free Navigation
41
Bitrate reductions for the same quality
42
Bjøntegaard delta, versus anchors
(Decoded view PSNR / Decoded video bitrate)
Sequence UHasselt* Poznan Zheijang
Big Buck Bunnyflowers
- -5.21% -4.62%
PoznanBlocks - -16.25% -9.77%
SoccerArc - -1.14% -11.93%
SoccerLinear -31,6% +2.20% -0.18%
Average (nonlinear) - -7.53% -8.77%
Average (all) - -5.10% -6.62%
Bjøntegaard delta, versus anchors(MOS / Decoded video bitrate)
Sequence UHasselt Poznan Zheijang
Big Buck Bunnyflowers
- -43.3% -19.2%
Poznan Blocks - -36.9% -4.9%
Soccer Arc - -70.1% -69.6%
Soccer Linear 2 -46.51% -28.5% 27.0%
Average (nonlinear) - -50.1% -31.3%
Average (all) - -44.7% -16.7%
PSNR metric MOS metric
Vittorio Baroncini, Masayuki Tanimoto, Olgierd Stankiewicz, “Summary of the results of the Call for Evidence on Free-Viewpoint Television: Super-Multiview and Free Navigation”, ISO/IEC JTC1/SC29/WG11 MPEG2016/N16318, June 2016, Geneva, Zwitserland
* Depth map generated after decoding
3D display View Synthesis (MPEG-FTV, DERS/VSRS)
[Nagoya University 80-cams input]
20
25
30
35
40
45
59 69 79 89 99 109 119
VSRS-Extended 3-views VSRS-Extended 5-views VSRS-Extended 9-views
VSRS-Extended 17-views VSRS-Extended 33-views [m35079]
PSN
R (
dB
)View no.
Interpolated Transmitted
-13 dB
3D graphics synthetic sequences
44
6-DoF Image-Based Transmission
45
80 cams
Oculus RiftStereo Images
Unity WebGL tool
3D content
© Universidad Politécnica de Madrid
View Synthesis of missing camera views:Depth Image Based Rendering (DIBR)
46
Far away object
Closeby object
Disparity DDisparity = 1/Depth
D/2
Mid-way view
Disoccluded regions have to be inpaintedfrom other available views
Left cam Right cam
Display one out of 7 camera inputs Perform View Synthesis (VSRS) on 7 camera inputs
View Synthesis Reference Software (VSRS)
47
Sweep around the display
13 views transmitted. For each, 5 additional synthesized views.
48
Per pixel:13 rays transmitted (16%)67 rays are synthesized (84%)
© NICT
Sweep around the display
View 59 = input View 66 = input
View 63 = synthesized
33
35
37
39
41
0 10 20 30 40 50 60 70 80Y-PS
NR
(dB)
View #
R1
rho=0.001rho=0.01Anchor
Anchor
Proposed
D=4dB
© NICT
PSNR ≈ (original image) – (decoded & view synthesized image)
49
Coding & View Synthesis artefacts
View 59 = input View 66 = inputView 63 = synthesized
© NICT 50
Real natural sequences
51
Estimate the depth map
52
UHasselt Soccer © University Hasselt53
Depth Image-Based Rendering from 7 RGB cameras
Depth Image-Based Rendering
54
Camera 0 viewLeft re-projection Right re-projection
Camera 0 Left re-projectionHole filling from Camera 1 re-projection
Blended
Camera View Virtual View
Point Cloud
Gaussian Splat rendering
Reprojection of a view in VSRS = a point cloud
1 view re-projected 3 views re-projected
VSRS = View Synthesis Reference Software (in MPEG)
Point Clouds (left) vs. VSRS 4.1 (right)
7 views re-projected + Gaussian splats in reverse warping 3 views re-projected + VSRS
Media Frameworkhttps://github.com/timlenertz/mf_view_syn VSRS 4.1 modified (m38480)
plugged into the Media Framework
Modular parallelization framework for multi-stream video processing
submitted to ACMMM 2016
Depth Image-Based Rendering from 10 RGB cameras
Poznan Fencing © Poznan University 59
Better Depth Map yields better View Synthesis
Fencing
15⁰
2⁰ 1m
Everything is related to the quality of the depth map [ref9] !!!!
Light Field Depth Estimation
Epipolar Plane Image (EPI)
OUTPUT: View Synthesis with 10 skipped cameras
INPUT: PBRT photo-realistic rendering
Cam
era
s
20
25
30
35
40
45
59 79 99 119
[Univ. Hasselt]
Light Fields vs. DERS/VSRS
DERS = MPEG Depth Estimation Reference SoftwareVSRS = MPEG View Synthesis Reference SoftwareLF = Light Fields
Epipolar Plane Image (EPI)
OUTPUT: View Synthesis with 10 skipped cameras
INPUT: PBRT photo-realistic rendering
Cam
era
s
20
25
30
35
40
45
59 79 99 119
[Univ. Hasselt]
Light Field Depth Estimation
Light Fields vs. DERS/VSRS
DERS = MPEG Depth Estimation Reference SoftwareVSRS = MPEG View Synthesis Reference SoftwareLF = Light Fields
Conclusion: Cinematic 6-DoF VR
66
Standardization CfP and Exploration:• Single panoramic stitched texture• Stereoscopic stitched textures• ODS to Light Fields approachExisting technology:• 3D-HEVC with VSRS view
synthesis/interpolationNovel technology:• Light Fields multi-cameras: discrete
and/or in-a-box
References
67
[ref1] Paul Bourke, “Synthetic stereoscopic panoramic images”, http://paulbourke.net/papers/vsmm2006/vsmm2006.pdf
[ref2] Fan Zhang and Feng Liu, “Parallax-tolerant Image Stitching”, http://web.cecs.pdx.edu/~fliu/papers/cvpr2014-stitching.pdf
[ref3] O. Tugrul Turan and Christopher Higgins, “ENHANCEMENTS FOR DIGITAL IMAGING OF GUSSETPLATE CONNECTIONS: FISHEYE AND IMAGE STITCHING”, report SPR 304-581, http://www.oregon.gov/ODOT/TD/TP_RES/docs/Reports/2011/FishStitch_SPR304_581.pdf
[ref4] Sing Bing Kang, Richard Szeliski, and Matthew Uyttendaele, “Seamless Stitching using Multi-Perspective Plane Sweep”, technical report MSR-TR-2004-48, June 2004 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2004-48.pdf
[ref5] https://opticalflow.wordpress.com/2016/05/11/correcting-360-degree-stereo-video-capture-part-1-of-3/
[ref6] https://support.google.com/jump/answer/6399843?hl=en
[ref7] http://www.roadtovr.com/google-announces-jump-an-open-vr-camera-design-with-stitching-solution-and-youtube-playback/
[ref 8] S Heymann, A Smolic, K Mueller, Y Guo, J Rurainsky, P Eisert, T Wiegand, “REPRESENTATION, CODING AND INTERACTIVE RENDERING OF HIGH RESOLUTION PANORAMIC IMAGES AND VIDEO USING MPEG-4”, Proc. Panoramic Photogrammetry Workshop (PPW) http://www.isprs.org/proceedings/xxxvi/5-w8/paper/PanoWS_Berlin2005_Heymann.pdf