point cloud compression & communicationl.web.umkc.edu/lizhu/docs/2019.pcc.pdfpoint cloud...
TRANSCRIPT
Z. Li, 2019
Point Cloud Compression & Communication
Zhu LiDirector, UMKC NSF Center for Big Learning
Dept of Computer Science & Electrical EngieeringUniversity of Missiouri, Kansas City
Email: [email protected], [email protected]: http://l.web.umkc.edu/lizhu
1
Z. Li, 2019
University of Missouri, Kansas City
Short Bio:
Research Interests:• Immersive visual communicaiton: light field, point cloud
and 360 video coding and low latency streaming• Low Light, Res and Quality Image Understanding• What DL can do for compression (intra, ibc, sr, inter,
end2end)• What compression can do for DL (compression,
acceleration)
signal processing and learning
image understanding visual communication mobile edge computing & communication
p.2
Multimedia Computing & Communication LabUniv of Missouri, Kansas City
Z. Li, 2019
Outline
• Short Self Intro & Research Overview • Point Cloud Capture and Applications• Geometry Compression• Graph Signal Processing and Attributes Compression• UMKC PCC work
– Static Geometry Compression: Plane Projection Approximation – Dynamic Geometry Compression: Kd-tree decomposition and
residual coding• Summary
3
Z. Li, 2019
Research Motivation & Highlights
2019
p.4
Z. Li, 2019
Devices Networks Applications
mmWave 5G
D2D
FD-MIMO4k/8k UHD Video Free Viewpoint TV
SDN/MEC
BIGDATA visual intelligence
Samsung VR/AR
Media Computing & Communication Horizon
p.5
Z. Li, 2019
Embedded Deep Learning for Image Understanding
• Automatic Image Tagging:• Mapping image pixel values to tags• Device based recognition:
• Distilling the knowledge into a compact form• Compact CNN model for knowledge distilling
– BIGDATA training set: >10k images per tag– training set augmentation:
» affine transforms, simulated lighting changes– Pruning filters from the CNN structure
p.6
Z. Li, 2019
Immersive Visual Communication
• Light Field Compression (6-DoF)– Sub-Aperture Image Based – Tensorial Display Decomposition Based
• Point Cloud Compression (6-DoF)– Video Based: Views + Depth– Binary Tree/Octree Geometry Coding + Graph Signal Compression
• 360 Video Compression (3-DoF)– Advanced Motion Model: Affine motion, Spherical motion models– Padding
p.7
Z. Li, 2019
Deep Learning in Video Compression
• Deep Learning Chroma prediction (from Luma)
• Results:– BD rate:
p.8
C1:128@16×16
ConvolutionKernel: 5×5 Convolution
Kernel: 3×3,5×5
C2: (96+32)@16×16
C3: (96+32)@16×16C4: 2@16×16
ConvolutionKernel: 1×1,3×3
ConvolutionKernel: 3×3
F1: 256@1×1F2: 196@1×1
F3: 128@1×1
Tile: 128@16×16
Fusion: 128@16×16
Full Connection Full Connection
Full Connection Tiling
Element wise Product
Input: 1@16×16
Input: 99@1×1
Down-sampled Reconstructed
Luma
Neighboring Reconstructed
Luma & Chroma
Predicted Chroma
deep chroma predicted blocks
Z. Li, 2019
Deep Learning in IBC prediction
• IBC with deep learning– IBC matching block fuse with intra ref
column/row pixels
– Performance :
p.9
Overall, 1.3%, 1.4%, 1.6% bits saving for Y, U, V component, respectively.
Peak performance on Luma component on BQTerrace achieves 3.6% BD-Rate reduction.
Z. Li, 2019
Low Light Image Denoising & Understanding
• Use cases– Low light imaging– Low light image understanding
• Deep Scaling & Residual learning
p.10
Z. Li, 2019
Results
• Low light image denoising
p.11
SID Unet
Ours
Z. Li, 2019
Outline
• Short Self Intro & Research Overview • Point Cloud Capture and Applications• Geometry Compression• Graph Signal Processing and Attributes Compression• UMKC PCC work• Summary
12
Z. Li, 2019
What is Point Cloud
• A collection of Un-ordered points with – Geometry: expressed as [x, y, z]– Color Attributes: [r g b], or [y u v]– Additional info: normal, timestamp, …etc.
• Key difference from mesh: no order or local topology info
p.13
Z. Li, 2019
Point Cloud Capture
• Passive: Camera array stereo depth sensor
• Active: LiDAR, mmWave, TOF sensors
p.14
Z. Li, 2019
Point Cloud Inter-Operability with Other Formats
p.15
• Provide true 6-DoF Content capacity
Z. Li, 2019
PCC in MPEG
• Part of the MPEG-Immersive grand vision
p.16
Z. Li, 2019
Point Cloud Compression - Geometry
• Can be lossy in both quantization and samples
p.17
Z. Li, 2019
Octree Based Point Cloud Compression
• Octree is a space partition solution– Iteratively partition the space into sub-blocks. – Encoding: 0 if empty, 1 if contains data points– Level of the tree controls the quantization error
p.18
Credit: Phil Chou, PacketVideo 2016
Z. Li, 2019
Lossless Compression of the Octree with Neural Network driven CABAC
• Tree Structure: – DFS scanning of the Octree node byte to have a byte stream– Compression of the byte stream via Arithmetic Coding, or
shallow neural network PAQ coding• Residual Coding:
– Range coding: coding the residual against a ref point (eg., centroids of octree leaf node centroids) – Plane/Surface approximation coding: compute the projection distances to a surface, surface can be polynomial or planar.
p.19
0 50 100 150 200 250 300-20
0
20
40
60
80
100
120
1402)uvw residual quantized
Z. Li, 2019
Scalable Point Cloud Geometry Coding
p.20
• Binary Tree embedded Quadtree (BTQT) coding: – Binary tree partition to have lossy geometry approximation– Refine each leaf node with Quadtree/Octree to offer scalable
details upto near lossless
Z. Li, 2019
Scalable Geometry Coding with BTQT
p.21
• Construct Binary Tree of Point Cloud • R1 = (2L-1)*(2+K) + 6*K, cost of signalling for resolution K bits
and binary tree depth L• Intra-Coding i.e. either Quadtree (flat surface) or
Octree(not flat)• R2 = 3*p + 3*q bits, for singalling norma at p bits and point at q
bits. q < K proportional to the leaf node size.
Z. Li, 2019
Quadtree/Octree
p.22
1 0 10
0 0 01 1 1 1 1
OctreeQuadtree
Flatness Criterian:
Z. Li, 2019
Scalable Coding with Quadtree/Octree
p.23
OctreeQuadtree
L = L1
L = L2 < L1
L = L3 < L2
Z. Li, 2019
Point cloud Visualization
• citytunnel dataset (MERL) – 1.5 km long section of a city, 21 millions point
p.24
Z. Li, 2019
Reconstructed-Zoomed
p.25
Z. Li, 2019
Result: Category 1 Geometry Coding Efficiency
p.26
Z. Li, 2019
Color Attributes Compression - DFS
• A Simple Solution: DFS scan the Octree, and compress the color attributes by reshaping it into some rectangular form
p.27
n
3
010000100011110100100
The order of color attributes scan matters !
Z. Li, 2019
Graph Signal Processing
• For signals sampled on an non-uniform grid, expressed as a graph, what are the tools for signal processing ?
p.28
Z. Li, 2019
Graph Signal
• For signals sampled on a graph– F={f(1), f(2), …, f(n)}
• Graph:
p.29
Z. Li, 2019
Laplace Operator
• What is Laplace Operator ? - 2nd order differentiation
p.30
Z. Li, 2019
Laplacian Matrix
• Consider1-D discrete signal
p.31
Z. Li, 2019
Laplacian Matrix on uniform grid (h=1)• 1-D discrete signal on uniform grid
• 2-D discrete signal
p.32
Z. Li, 2019
Graph Laplacian
• 2nd order operator on signals on graph
p.33
Z. Li, 2019
Graph Fourier Transform
• Accounting for non-uniform grid, expressed by graph Laplacian, can we have Fourier like signal analysis on graph ?
p.34
Z. Li, 2019
Normalized Graph Laplacian
• normalized by the vertex degree
p.35
Z. Li, 2019
Graph Laplacian Eigen Basis
p.36
Z. Li, 2019
Graph Lapalcian Eigen Vectors
• Graph Cut/Partition Interpretation
p.37
Z. Li, 2019
Fourier Analysis
• Recall that for uniformly sampled signal, we have Fourier analysis:
p.38
N: number of samplesxn: signal at time nXk: signal at freq k
Z. Li, 2019
Graph Fourier Transform
• Eigen vector of Laplacian provides GFT basis
p.39
Credit: Prof. Pascal Frossard, EPFL
Z. Li, 2019
Graph Spectral Freq
• Zero crossing count
p.40
Z. Li, 2019
FT is special case of GFT
• Consider 1-D 8 point discrete signal on a simple graph
– First AC Eigen Vector: u2 as a function of w1,2,
– Graph Laplacian Eigen vectors default to FFT if all graph weights are uniform, and DFT if connected into a circle.
p.41
Z. Li, 2019
Optimal Energy Compaction - KLT
• For signals on uniformly sampled grid, KLT is optimal energy compaction transform:
The direction that maximizes the variance is the eigenvector associated with the largest eigenvalue of Σ
p.42
a unitary transform with the basis vectors in A being the “orthonormalized” eigenvectors of Rx
• However, not cheap, need to signal this nxn unitary transform with good precision.
Z. Li, 2019
Graph Transform
• Trying to achieve compaction of energy by grouping signal points closer together into the same group
• Free lunch: geometry is reconstructed (vs KLT)• Design Parameters:
p.43
Z. Li, 2019
Point Cloud Color Attributes Coding
• Photometrical info are not sampled on uniform grid
– Ref: Cha Zhang, Dinei A. F. Florêncio, Charles T. Loop:Point cloud attribute compression with graph transform. ICIP 2014: 2066-2070
p.44
4x4x4 octree node: Affinity reflecting voxel distance
Z. Li, 2019
Graph Transform
• For the color attributes in the 4x4x4 node:– Graph Transform U from Laplacian
– U is invertable– Graph signal representation of photometric info I
– Quantization & Coding
p.45
Z. Li, 2019
Optimized Graph Transform
p.46
Z. Li, 2019
Nested Transform Attributes Coding
• Nested Transform of attributes on graph– 1st Transform: kd-tree partition to create super pixel
representation– L1=16, 2L1=65536 downsampled representation from local
GFT/DCT on leaf node pixels.
p.47
700k input points 65k superpixels
Z. Li, 2019
Nested DCT transform
• We then first apply DCT to the leaf nodes with 13 or 14 points– Each leaf node is then represented by the DC value are all the
leaf nodes are organized into a new kd-tree with depth 12– Each leaf node are with 16 points are organized into 4x4 blocks
p.48
Z-scan Raster-scan
Z. Li, 2019
Superpixel AC Prediction
• The coding of the AC coefficients– The AC coefficients are with less correlations– The AC coefficients are then quantized and entropy coded– According to the final results, the AC coefficients only take 12%
of the total energy but take 88% of the total bits
• Potential improvements– Decorrelate the AC coefficients– Increase the Kd depth and reduce the AC coefficients
p.49
Z. Li, 2019
Further enhance the Kd-tree depth
• Kd-tree depth equals to 17– Each leaf node will be with only 6 or 7 points– The DC image resolution is 512x256
p.50
Z-scan
Z. Li, 2019
Current results
• The R-D curve comparison of the Longdress– The proposed scheme can achieve some gains in low bitrate
case while suffers some losses in high bitrate case– AC prediction seems to be the problem.
p.51
Z. Li, 2019
Summary
• Point Cloud capture and compression can enable many applications in VR/AR, navigation and auto-driving
• Core enablers are computational geometry and graph signal processing tools.
• Geometry compression – space/data partition and prediction
• Attributes compression – signals defined on a graph, needs many new tools on graph signal sampling, filtering and smoothing.
• Related papers, check out: – http://l.web.umkc.edu/lizhu
p.52
Z. Li, 2019
• Q & A
53