![Page 1: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/1.jpg)
Probabilistic Approaches for RGB-D Video Enhancement and Applications
Speaker: Lu ShengSupervisor: Prof. King Ngi Ngan
Lu Sheng, Thesis Oral Defense
![Page 2: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/2.jpg)
Why RGB-D Data Essential?
RGB: 2D visual pattern Depth: 3D geometry
RGB image cannot explicitly tells the computer the 3Dstructure of each object
Depth cannot tell us the texture patterns overlaid
RGB + Depth helps us to comprehensively understand the3D visual world
Lu Sheng, Thesis Oral Defense
![Page 3: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/3.jpg)
Why RGB-D Data Essential?
Explosive growth of 3D applications
3D reconstruction Novel view synthesis
Virtual reality / Augmented reality 3DTV & FTV Refocus
Motion sensing /gesture recognition
Lu Sheng, Thesis Oral Defense
![Page 4: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/4.jpg)
Why RGB-D Data Essential?
Explosive growth of 3D applications
Autonomousnavigation & safety
Personal & industrial robots
Scene understanding Pedestrian detection Action recognitionLu Sheng, Thesis Oral Defense
![Page 5: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/5.jpg)
Stereo vision
Shape-from-shading Structure-from-motion
Recent Depth Acquisition Methods
L R
Drawbacks
Usually computationally intensive
Mediocre quality
Require simple or artificial shooting conditions.Lu Sheng, Thesis Oral Defense
![Page 6: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/6.jpg)
Recent Depth Acquisition Methods
Kinect Time-of-flight camera Laser scanner
Compare to passive methods
Standard resolution depth frames in video frame rate
More robust to difficult shooting conditions
Drawbacks
Poor quality impedes the depth-based tasks to give full play to their potential performances
Lu Sheng, Thesis Oral Defense
![Page 7: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/7.jpg)
High Quality Depth Data are Important
A lot of applications require high quality depth data
Spatiotemporal depth video enhancement is necessary
Depth data cannot perform structural regularization by their own
If accompanied by synchronized RGB data
multi-modal structural features shared by texture and geometry enable guidance from the texture features to regularize the depth maps
Lu Sheng, Thesis Oral Defense
![Page 8: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/8.jpg)
Depth is NOT Texture
Depth links to the 3D geometry of the captured scene
Learn effective methods to encode these observations
Spatial relationshipsbetween objects
Depth ordering
Occlusion reasoning
Object segmentation
Geometric structures inside each object
Piecewise smoothness
Distinctive discontinuities
Lu Sheng, Thesis Oral Defense
![Page 9: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/9.jpg)
Goals
Explore effective ways to render robust spatiotemporal RGB-D depth video enhancement
Learn specific treatments compatible to 3D geometry forenhancement and depth-based applications
Employ probabilistic approaches to model these tasks
Lu Sheng, Thesis Oral Defense
![Page 10: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/10.jpg)
Hybrid Geometric Hole Filling Strategy
for Spatial Enhancement
Spatial RGB-D Enhancement
Lu Sheng, Thesis Oral Defense
![Page 11: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/11.jpg)
Introduction
enhanced depth image
Low resolution Noise & outliers Depth missing holes Structure distortions
RGB-D images upsampled raw depth image
? High definite Structure optimized Complete
Lu Sheng, Thesis Oral Defense
![Page 12: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/12.jpg)
Introduction
Observations
Co-occurrences between depth discontinuities and image edges
Homogeneous texture patterns have similar 3D geometries
Lu Sheng, Thesis Oral Defense
![Page 13: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/13.jpg)
Hybrid Geometric Hole Filling Strategy
Filtering-based Depth Interpolation
Segment-based Depth Propagation
Hole Filling
Depth Map Refinement
Input RGB-D pair Output RGB-D pair
Lu Sheng, Thesis Oral Defense
![Page 14: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/14.jpg)
Hole Partitioning
Up-sample low-resolution depth map into sparse grid
Pixels are divided into two parts
in hole region:
with depth values:
Further partition holes into two parts
based on valid depth pixels in its neighbors
Lu Sheng, Thesis Oral Defense
![Page 15: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/15.jpg)
Filtering-based Depth Interpolation
Filtering-based Depth Interpolation for region
Require enough depth info. in the neighbors to infer a reliable depth value
Joint Bilateral Filtering
Fill Fill whole image
× =
Lu Sheng, Thesis Oral Defense
![Page 16: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/16.jpg)
Depth Propagation under Segment Constraint
Depth Propagation for region
Segment constraint
Depth variation is smooth in an over-segmented RGB patch
One parametric surface model in one patch
Generate segments
Superpixel – simple linear iterative clustering (SLIC)
Hole patch
Patch with known depth
Partially filled patch
After filling
Lu Sheng, Thesis Oral Defense
![Page 17: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/17.jpg)
Depth Propagation under Segment Constraint
Filling the partially filled patches by surface fitting with RANSAC
Surface propagation for patches
Assign the surface model by finding its most similar RGB patch with known surface model in the neighborhood
The cost function models the statistical texture similarity and spatial distance
A greedy algorithm is exploited
Lu Sheng, Thesis Oral Defense
![Page 18: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/18.jpg)
Depth Propagation under Segment Constraint
Generate segments Fill in partially filled patches Fill in hole patches
Depth map refinement
Various filtering methods can be exploited here
A standard joint bilateral filtering is utilized for simplicity
Lu Sheng, Thesis Oral Defense
![Page 19: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/19.jpg)
Experimental Results
Middlebury dataset
Error metric: Bad Pixel Ratio (Δ𝑑 ≥ 1 as bad pixel)
[1] C. Richardt, et. al., Coherent spatiotemporal filtering, upsampling and rendering of RGBZ videos, CGF, 2012[2] L. Wang, et. al., Stereoscopic inpainting: Joint color and depth completion from stereo images, CVPR 2008.
RGB Images Depth images Ground truth Muti-res JBU[1] Wang et.al [2] Proposed method
BP: 8.35% BP: 3.65% BP: 3.33%
BP: 14.10% BP: 3.10% BP: 2.51%
Lu Sheng, Thesis Oral Defense
![Page 20: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/20.jpg)
Weighted Structure Filters
Based on Parametric Structural Decomposition
Spatial RGB-D Enhancement
Lu Sheng, Thesis Oral Defense
![Page 21: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/21.jpg)
Introduction
A variety of popular image filters are related to the local statistics of the input image
Median filter: catch half point at the cumulative local distribution
Mode filter: seek the global mode of the local distribution
Average filter: estimate the expectation of the local distribution
Lu Sheng, Thesis Oral Defense
![Page 22: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/22.jpg)
Introduction
Provided with a guidance feature map
Image intensity, patches, edge maps, …
These filters can be extended to joint weighted filters
Propagate local feature statistics into the target image
Various applications
Enhancement / de-noising / style manipulation / structure decomposition ….
Lu Sheng, Thesis Oral Defense
![Page 23: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/23.jpg)
Introduction
Disparity enhancement
Image denoising
JPEG artifact removalContrast enhancement
Image stylization
Joint depth upsampling
Lu Sheng, Thesis Oral Defense
![Page 24: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/24.jpg)
Weighted Distribution Estimation
The weighted distribution is
encodes both the spatial nearness and range affinity
measures the data compatibility
Brute-force implementation is of high computational cost
Computational cost depends on the number of samples 𝑔𝑖
✓ Hundreds of filtering operations are required to output a satisfactory distribution
✓ How to reduce it but do not distort the distribution?
𝑔𝑖
Lu Sheng, Thesis Oral Defense
![Page 25: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/25.jpg)
Structures in a Local Patch
cloud
object
tower
sky
Lu Sheng, Thesis Oral Defense
![Page 26: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/26.jpg)
Structures in a Local Patch
cloud
object
tower
sky
A patch of a natural image does not contain a large number of structures
Nearby patches share similar structures
Two pixels are similar if they both have high likelihoods to the same local structures
It is possible to construct the distribution of a local patch by the mixture model Lu Sheng, Thesis Oral Defense
![Page 27: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/27.jpg)
A Probabilistic Kernel
Convention kernel for data compatibility
Assume the image is conveyed by several (e.g. 𝐿) structures throughout the image domain
Measure the difference between 𝑓𝑥 and 𝑓𝑦
Lu Sheng, Thesis Oral Defense
![Page 28: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/28.jpg)
A Probabilistic Kernel
Each structure is a probabilistic model
Two pixels are similar if they both have high responses to the 𝑙𝑡ℎ model
Assemble all models
Gaussian distribution with noise std
Lu Sheng, Thesis Oral Defense
![Page 29: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/29.jpg)
Weighted Distribution Estimation
Kernel
Gaussian, Kronecker delta, etc.
Distribution Estimation
Kernel
Local structure similarity
Distribution Estimation
Conventional Distribution The Proposed Distribution
Need hundreds of filtering operations
Only 𝐿 filtering operations to get 𝜓𝐱 𝑙 , 𝑙 ∈ ℒ
A mixture models!
Lu Sheng, Thesis Oral Defense
![Page 30: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/30.jpg)
Gaussian Models for the Local Structures
Gaussian distribution to define the models for the local structures
Uniformly Quantized Models (UQM)
Locally Adaptive Models (LAM)
Lu Sheng, Thesis Oral Defense
![Page 31: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/31.jpg)
Gaussian Models for the Local Structures
Estimation of the Locally Adaptive Models
Hierarchical Clustering by Binary Space Partition Tree
1
𝑆1
3
𝑆3
2
𝑆2
7
6
4
5
+
+
+
- -
-with
Lu Sheng, Thesis Oral Defense
![Page 32: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/32.jpg)
Experimental Results & Discussions
The speedup of the proposed method
The gain is generally 2~4x faster for grayscale image 6~12x faster for color image Even faster for disparity map or cartoon-style
image due to their high structural homogeneity A manual threshold to stop model generation
Runtime comparison
Estimate the necessary LAM models on the BSD3000 dataset
Lu Sheng, Thesis Oral Defense
![Page 33: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/33.jpg)
Experimental Results & Discussions
Application-I: Disparity Enhancement (error metric: RMSE)
~16s
~4s<1s
Lu Sheng, Thesis Oral Defense
![Page 34: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/34.jpg)
Experimental Results & Discussions
Application-I: Disparity Enhancement
Cover more details & avoid staircase artifact Although small number of LAM models cannot cover all the details, it is
still superior to the UQM models
Lu Sheng, Thesis Oral Defense
![Page 35: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/35.jpg)
Raw Color frame
Spatial filter Spatiotemporal filter
Lu Sheng, Thesis Oral Defense
![Page 36: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/36.jpg)
Experimental Results & Discussions
Application-II: JPEG Block Artifact Removal
Piecewise smooth results and reduce staircase artifact but do not distort necessary structures
Lu Sheng, Thesis Oral Defense
![Page 37: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/37.jpg)
Experimental Results & Discussions
Application-III: Contrast Enhancement
source image
after structure-preserving
filtering
after detailenhancement
Lu Sheng, Thesis Oral Defense
![Page 38: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/38.jpg)
Experimental Results & Discussions
Application-IV: Joint Depth Map Upsampling
Lu Sheng, Thesis Oral Defense
![Page 39: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/39.jpg)
Spatiotemporal Enhancement
based on Static Structure
Spatiotemporal RGB-D Enhancement
Lu Sheng, Thesis Oral Defense
![Page 40: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/40.jpg)
Introduction
A raw depth video of a natural scene
Contains various complex and even unpredictable dynamic contents
Suffers spatial and temporal artifacts
Raw Kinect video
Color-coded Raw TOF video
Lu Sheng, Thesis Oral Defense
![Page 41: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/41.jpg)
Introduction
A raw depth video of a natural scene
Contains various complex and even unpredictable dynamic contents
Suffers spatial and temporal artifacts
After the spatial enhancement
Reduce artifacts in spatial domain
But introduce temporal flickering
No temporal consistency
Aggravate flickering artifacts
Raw Kinect video
Spatial JBF
Lu Sheng, Thesis Oral Defense
![Page 42: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/42.jpg)
Introduction
After a conventional spatiotemporal enhancement
Still contain temporal flickering
Distort depth variation on dynamic objects
Coherent spatiotemporal JBF
Spatial JBF
How to eliminate the temporal flickering while not distort the necessary depth
variation along dynamic objects?
Lu Sheng, Thesis Oral Defense
![Page 43: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/43.jpg)
Static Structure
A moving object
A static object
The static background
Kinect or another depth camera
Lu Sheng, Thesis Oral Defense
![Page 44: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/44.jpg)
Static Structure
A moving object
A static object
The static background
Kinect or another depth camera
Captured depth map
Lu Sheng, Thesis Oral Defense
![Page 45: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/45.jpg)
Static Structure
Intrinsic structure underneath the captured scene
lies on or behind the surface of the input depth frame
A probabilistic medium to indicate whether a region is static
A moving object
A static object
The static background
Kinect or another depth camera
static structure
Lu Sheng, Thesis Oral Defense
![Page 46: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/46.jpg)
Static Structure
Simple observations
Moving objects stay in its front
Static regions or visible background area are fused into it
A moving object
A static object
The static background
Kinect or another depth camera
static structure
Lu Sheng, Thesis Oral Defense
![Page 47: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/47.jpg)
Static Structure Spatiotemporal Enhancement
Robust static/dynamic region detection by the static structure
Spatiotemporally enhance the static region with the static structure
Spatially optimized the dynamic foreground
Temporally coherent for static region and depth variation preserved
for dynamic contents
How to estimate static structure?
Lu Sheng, Thesis Oral Defense
![Page 48: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/48.jpg)
Generative Model for Static Structure
Camera center
Line of sight
Current static structure
Behind the structure
Before the structure
A Probabilistic Generative Model
Lu Sheng, Thesis Oral Defense
![Page 49: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/49.jpg)
Generative Model for Static Structure
A Probabilistic Generative Model
If incoming depth belongs to
State-I: the static structure
Camera center
Line of sight
Current static structure
State-I
Lu Sheng, Thesis Oral Defense
![Page 50: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/50.jpg)
Generative Model for Static Structure
A Probabilistic Generative Model
If incoming depth belongs to
State-I: the static structure
State-II: outliers in the front or moving objects
Camera center
Line of sight
Current static structure
State-II
is an indicate function that is equal to 1, when input argument is true and 0 vice visa
Lu Sheng, Thesis Oral Defense
![Page 51: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/51.jpg)
Generative Model for Static Structure
A Probabilistic Generative Model
If incoming depth belongs to
State-I: the static structure
State-II: outliers in the front or moving objects
State-III: outliers rearward or revealedbackground
Camera center
Line of sight
Current static structure
State-III
is an indicate function that is equal to 1, when input argument is true and 0 vice visa
Lu Sheng, Thesis Oral Defense
![Page 52: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/52.jpg)
Generative Model for Static Structure
A Probabilistic Generative Model
The likelihood of w.r.t. the given static structure
Gaussian prior over
Dirichlet prior over the frequency of each state
Camera center
Current static structure
Lu Sheng, Thesis Oral Defense
![Page 53: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/53.jpg)
Online Update Scheme
A Probabilistic Generative Model
The posterior
is the set of previous depth samples
is the set of current samples
Camera center
Current static structure
Lu Sheng, Thesis Oral Defense
![Page 54: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/54.jpg)
Online Update Scheme
A Probabilistic Generative Model
The posterior
is the set of previous depth samples
is the set of current samples
If the input frame only contains the static scene and outliers, the updated static structure will be governed by the posterior, and we have
Its probable depth is
The reliability of the model is
Variational approximation for efficiencyCamera center
Updated static structure
Lu Sheng, Thesis Oral Defense
![Page 55: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/55.jpg)
Layer Assignment
Label the input depth frame into three layers
𝑙𝑖𝑠𝑠: agree with estimated static structure
𝑙𝑑𝑦𝑛: belong to dynamic objects
𝑙𝑜𝑐𝑐: refer to the previous occluded structure
𝑙𝑖𝑠𝑠 and 𝑙𝑜𝑐𝑐 defines the current static regions
Fully Connected Conditional Random Fields with effective inference based on real-time high-dimensional filters
𝒍𝒊𝒔𝒔
𝒍𝒅𝒚𝒏
𝒍𝒐𝒄𝒄
Lu Sheng, Thesis Oral Defense
![Page 56: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/56.jpg)
Layer Assignment & Online Update of the Static Structure
(a)
(b)
(c)
(d)
(e)
#1 #2 #3 #4
#1 #2 #3 #4
#1 #2 #3 #4
#1 #2 #3 #4
#1 #2 #3 #4
#5
#5
#5
#5
#5
Raw depth
Raw color
Layer assign.
Depthstatic
struct.
Colorstatic
struct. Lu Sheng, Thesis Oral Defense
![Page 57: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/57.jpg)
Layer Assignment & Online Update Update of the Static Structure
#1 #2 #3 #4 #5
(a)
(b)
(c)
#1 #2 #3 #4 #5
#1 #2 #3 #4 #5
Raw depth
Layer assign.
Depthstatic
struct.
Lu Sheng, Thesis Oral Defense
![Page 58: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/58.jpg)
Spatiotemporal Depth Video Enhancement
Input data (𝑡)
Layer Assignment
VariationalApproximation
Spatial Enhancement
Static Structure (𝑡)
Static Structure (𝑡 − 1)
Spatiotemporal Depth Video Enhancement
Online Static Structure Updating Scheme Enhanced depth frame
(𝑡)Lu Sheng, Thesis Oral Defense
![Page 59: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/59.jpg)
Result Comparisons
(a) Raw RGB-D videos
(b) Proposed method (c) Lang et al. [3]
[1] C. Richardt, et. al, “Coherent spatiotemporal filtering, upsamplingand rendering of RGBZ videos,” Computer Graphics Forum, 2012.
[2] D. Min et al, “Depth video enhancement based on weighted mode filtering,” TIP, 2012.
[3] M. Lang et al, “Practical temporal consistency for image-based graphics applications,”TOG. 2012.
superior in static scene reconstruction dynamic object enhancement
Lu Sheng, Thesis Oral Defense
![Page 60: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/60.jpg)
Result Comparisons
(a) Raw RGB-D videos (b) Proposed method
(c) CSTF [1] (d) WMF [2] (e) Lang et al. [3]Lu Sheng, Thesis Oral Defense
![Page 61: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/61.jpg)
Color frames
Depth frames
CSTF [1]
WMF [2]
Lang et al. [3]
Ours
Closed-upsLu Sheng, Thesis Oral Defense
![Page 62: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/62.jpg)
Result Comparisons
(b) Proposed method (c) Lang et al. [3]
(a) Raw RGB-D videos
(b) Proposed method (c) Lang et al. [3]
(a) Raw RGB-D videos
dyn_kinect_1 dyn_kinect_2
Lu Sheng, Thesis Oral Defense
![Page 63: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/63.jpg)
Result Comparisons
(a) dyn_kinect_2 (b) dyn_kinect_3
Color
Depth
Lang et al. [3]
Ours
dyn_kinect_1 dyn_kinect_2
Lu Sheng, Thesis Oral Defense
![Page 64: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/64.jpg)
Applications
Application-I: Background Subtraction
color image by raw depth image by the proposed method
Lang et al. [3] CSTF [1] WMF [2]Lu Sheng, Thesis Oral Defense
![Page 65: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/65.jpg)
Applications
Application-II: Novel View Synthesis
(a) color image (b) raw depth image (c) enhanced depth image
(d) by raw depth image (e) by static structure (f) by enhanced depth imageLu Sheng, Thesis Oral Defense
![Page 66: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/66.jpg)
A Generative Model
for Robust 3D Facial Pose Tracking
Depth-based Application
Lu Sheng, Thesis Oral Defense
![Page 67: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/67.jpg)
Introduction
Why facial pose tracking interesting?
Immersive Video Communication
3DTV & Free-viewpoint TV
VR / AR and etc.
With expression added?
Image/Video Editing
Performance Capturing and etc.
Lu Sheng, Thesis Oral Defense
![Page 68: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/68.jpg)
Introduction
How to let it
Markerless
No explicit or manual markers
Realtime
Cannot afford sophisticated correspondence estimation & face shape representation
Robustness and Smoothness
Robust to illumination variations, occlusions & outliers
Robust to varying facial expressions
Temporally coherent tracking
Adaptive to any user on-the-fly without manual calibration
Lu Sheng, Thesis Oral Defense
![Page 69: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/69.jpg)
Introduction
RGB based facial pose tracking has been successfully performed under optimally constrained scenes
It is fragile for unconstrained capturing conditions
Illumination variations
Shadows
Large and severe occlusions
Common in numerous applications in consumer level
Lu Sheng, Thesis Oral Defense
![Page 70: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/70.jpg)
Introduction
Commodity real-time range sensors
Explicitly tell the space relationship
Irrelevant to illumination variations & shading
Easier inference for occlusions
BUT new challenges arisen Noise, missing values &
outliers Complex occlusions Varying expressions Online user adaptation
Lu Sheng, Thesis Oral Defense
![Page 71: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/71.jpg)
The Proposed Method
A framework that
unifies pose tracking and face model adaptation on-the-fly
offers accurate, occlusion-aware and uninterrupted 3D facial pose tracking
A visibility constrained criterion for
correspondence-free and occlusion-aware rigid facial pose estimation
A generative multilinear face model
both models the identity and expression
facilitates the online face model personalization without the interference caused by the expression variations
Lu Sheng, Thesis Oral Defense
![Page 72: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/72.jpg)
Probabilistic 3D Face Parameterization
Multilinear Face Model
Unifies the representations of identity and expression
Models the face dataset as a 3D tensor
Decomposes it by High-order singular value decomposition
Any face can be reconstructed as
Lu Sheng, Thesis Oral Defense
![Page 73: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/73.jpg)
Probabilistic 3D Face Parameterization
Generative models for face modeling
Model the uncertainties of the shape, identity, and expression
Feasible to simulate, predict the face identity and expression
Enable group-wise rigid facial pose estimation suitable for any faces
The generative face model can be learned from a training dataset
FaceWarehouse Dataset
150 identity, 47 expressions Different ages, genders, races … Its diversity lets the learned face
model cover most common identities and expressions
Lu Sheng, Thesis Oral Defense
![Page 74: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/74.jpg)
Probabilistic 3D Face Parameterization
Identity and Expression Priors
Multilinear Gaussian Face Model
Learned from the FaceWarehouse datasettogether with the core tensor
for for
(b) Variance by (c) Variance by
mm
(a) Mean face (d) Variance by Lu Sheng, Thesis Oral Defense
![Page 75: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/75.jpg)
Probabilistic Facial Pose Tracking
Rigid PoseTracking
Identity Adaptation
Input
Output
Identity distribution
Pose Parameters Face Model
Lu Sheng, Thesis Oral Defense
![Page 76: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/76.jpg)
Transform a canonical face model to match the input point cloud
The warped face model has the distribution
Robust Facial Pose Estimation
(b) Variance by (c) Variance by
mm
(a) Mean face (d) Variance by
Face model in canonical coordinate
inputpoint cloud
scale rotation translation
Lu Sheng, Thesis Oral Defense
![Page 77: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/77.jpg)
Ray Visibility Constraint
Occlusions are inevitable in uncontrolled scenarios
Occluded human faces are always behind the occluding objects, like hairs,fingers/gestures, glasses, accessories
Self-occlusion Occluded by hair
Occluded by hand/gestureOccluded by accessories
Lu Sheng, Thesis Oral Defense
![Page 78: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/78.jpg)
Ray Visibility Constraint
Ray Visibility Constraint
If correctly aligned
the visible face model points are those that overlap with the input point cloud
the rest face model points should always be occluded by the input point cloud
(a) Case-I (b) Case-II (c) Case-III
Face point is visible Face point is occluded
Should be prevented
Lu Sheng, Thesis Oral Defense
![Page 79: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/79.jpg)
Ray Visibility Constraint
Connect point pair along a ray
their distance along the surface of the input data
The distribution of one face model point ismapped along the surface normal direction
The face model point is visible
The face mode point is occluded visible
occluded
face distribution
line-of-sightcamera
Lu Sheng, Thesis Oral Defense
![Page 80: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/80.jpg)
Ray Visibility Constraint
Ray Visibility Score
Measures the compatibility between the distributions of the face model andthe input point cloud
Applies the Kullback-Leibler Divergence
data distribution
projected model distribution
The minimization of ray visibility score results in the optimalcompatibility between these two distribution
Quasi-Newton method & further refined by particle swarm optimization
Occlusions receive constant penalties
Visible points punish the misalignment & model uncertainties
More robust than ICP-based cost function
solver
Lu Sheng, Thesis Oral Defense
![Page 81: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/81.jpg)
Robust Facial Pose Estimation
Result comparison with the generic face model
(a) Color image (b) Point cloud (c) Initial alignment
(d) ICP (e) RVC + ML (f) RVS (g) RVS + PSO
Lu Sheng, Thesis Oral Defense
![Page 82: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/82.jpg)
Robust Facial Pose Estimation
More results with the generic face model
(a) Color image (b) Point cloud (c) Initial alignment (d) Ours
no explicit correspondences
handle occlusions even with apoor initial pose
less vulnerable to bad localminima
PSO increases the robustness
Lu Sheng, Thesis Oral Defense
![Page 83: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/83.jpg)
Online Identity Adaptation
Variational Approximation
The face model is identified by the identity distribution
It can be online estimated through assumed density filtering (ADF)
The data likelihood A mixture distribution encoding the model and outlier The model fitting function is robust to quantization with a modified
projection distance
The variance of identity is enlarged per frame to prevent overfitting Lu Sheng, Thesis Oral Defense
![Page 84: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/84.jpg)
Online Identity Adaptation
(a)
(b)
(c)
Results of online model adaptationLu Sheng, Thesis Oral Defense
![Page 85: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/85.jpg)
Experimental Results & Discussions
Experiments on public depth-based facial pose datasets
Biwi dataset ICT-3DHP dataset
Dataset 𝑵𝒔𝒆𝒒 𝑵𝒇𝒓𝒎 𝑵𝒔𝒖𝒃𝒋 occlusions expressions 𝝎𝒎𝒂𝒙
Biwi 24 ~15K 25accessories
hairneutral ~ slight
±75 yaw±60 pitch
ICT-3DHP 10 ~14k 10accessories
hairslight ~
exaggerated±75 yaw±45 pitch
Lu Sheng, Thesis Oral Defense
![Page 86: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/86.jpg)
Experimental Results & Discussions
Robust to profiled faces due to large rotations and occlusions from hair andaccessories.
profiled face profiled faceocclusions
occlusions expressions profiled faceocclusions
Lu Sheng, Thesis Oral Defense
![Page 87: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/87.jpg)
Experimental Results & Discussions
The proposed system is also effective to the expression variations
Ray visibility constraint
efficiently infer the occlusionsagainst the face model
optimize the visible face areaagainst the occlusions
Personalized face model
enables compact fitting
robust to changes in thepersonalized expressions
Lu Sheng, Thesis Oral Defense
![Page 88: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/88.jpg)
Experimental Results & Discussions
Adaptation between different users
Three different identities are presented in three adjacent frames
Lu Sheng, Thesis Oral Defense
![Page 89: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/89.jpg)
Experimental Results & Discussions
Comparison with the state-of-the-arts
MethodErrors
Yaw (deg) Pitch (deg) Roll (deg) Trans (mm)
Ours 2.3 2.0 1.9 6.9
RF 8.9 8.5 7.9 14.0
Martin 3.6 2.5 2.6 5.8
CLM-Z 14.8 12.0 23.3 16.7
TSP 3.9 3.0 2.5 8.4
PSO 11.1 6.6 6.7 13.8
Meyer 2.1 2.1 2.4 5.9
Li* 2.2 1.7 3.2 -
*This method is based on RGB-D data
Discriminative: RF Model fitting: CLM-Z, PSO, Martin et al.,
Meyer et al. Feature-based: TSP RGB-D: Li*
MethodErrors
Yaw (deg) Pitch (deg) Roll (deg)
Ours 3.4 3.2 3.3
RF 7.2 9.4 7.5
CLM-Z 6.9 7.1 10.5
Li* 3.3 3.1 2.9
Biwi dataset ICT-3DHP dataset
Lu Sheng, Thesis Oral Defense
![Page 90: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/90.jpg)
Conclusion
Lu Sheng, Thesis Oral Defense
![Page 91: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/91.jpg)
Conclusions
Hybrid Geometric Hole filling Strategy for Spatial enhancement
• Hybrid hole filling merging the interpolation and parametric structure propagation
• A novel texture-constrained patch matching method for a robust structure inference
Weighted Structure Filters Based on Parametric Structural Decomposition
• An efficient distribution estimation that are adaptive to local image structure
• Accelerating joint weighted filters without structural distortions
Lu Sheng, Thesis Oral Defense
![Page 92: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/92.jpg)
Conclusions
Spatiotemporal Enhancement based on Static Structure
• Robust temporally consistent depth enhancement based on a probabilistic static structure of the captured scene
• The dynamic content is enhanced spatially while the static region favors a long-range spatiotemporal optimization
A Generative Model for Robust 3D Facial Pose Tracking
• A robust depth-based facial pose tracking system with an adaptive face model personalization
• The multilinear generative face model and the visibility-constrained rigid pose estimation improve the robustness
Lu Sheng, Thesis Oral Defense
![Page 93: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/93.jpg)
Publications
Lu Sheng, King Ngi Ngan, Chern-Loon Lim and Songnan Li, Online Temporally Consistent Indoor Depth Video Enhancement via Static Structure, TIP, 2015.
Songnan Li, King Ngi Ngan, Raveendran Paramesran and Lu Sheng, Real-time Head Pose Tracking with Online Face Template Reconstruction, TPAMI, 2016.
Lu Sheng, Tak-Wai Hui and King Ngi Ngan, Accelerating the Distribution Estimation for the Weighted Median/Mode Filters, ACCV, 2014.
Lu Sheng, Songnan Li and King Ngi Ngan, Temporal Depth Video Enhancement Based On Intrinsic Static Structure, ICIP, 2014.
Lu Sheng, King Ngi Ngan and Songnan Li, Depth Enhancement Based On Hybrid Geometric Hole Filling Strategy, ICIP, 2013.
Chi Ho Cheung, Lu Sheng and King Ngi Ngan, A disocclusion filling method using multiple sprites with depth for virtual view synthesis, ICMEW, 2015.
Songnan Li, King Ngi Ngan and Lu Sheng, Screen-camera Calibration Using a Thread, ICIP, 2014.
Songnan Li, King Ngi Ngan and Lu Sheng, A Head Pose Tracking System Using RGB-D Camera, ICVS, 2013.
Lu Sheng, Jianfei Cai and King Ngi Ngan,, TIP, in preparation. A Generative Model for Robust 3D Facial Pose Tracking, TIP, in preparation.
Lu Sheng and King Ngi Ngan, Weighted Structural Prior for Structure-preserving Image and Video Applications, TIP, in preparation. Lu Sheng, Thesis Oral Defense
![Page 94: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/94.jpg)
Thanks to
My supervisor Prof. King Ngi NganProf. Jianfei Cai
Committee members Prof. Wai Kuen Cham, Prof. Thierry Blu,
and Prof. Kwanghoon Sohn
My lovely IVP labmates
& My sweet families!
Lu Sheng, Thesis Oral Defense
![Page 95: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/95.jpg)
Depth Propagation under Segment Constraint
Cost function construction
Randomly select 𝑘 sub-patches in each patch
Estimate similarity between two sub-patches
Calculate the cost of 𝑗𝑡ℎ sub-patch of 𝑢 with 𝑣, and find the 𝑣∗ patch with the minimum cost
Form a histogram indicating the number of sub-patches in 𝑢 that matches with 𝑣
Add spatial constraint, the cost is
Lu Sheng, Thesis Oral Defense
![Page 96: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/96.jpg)
Gaussian Models for the Local Structures
Kernel Specification
Distribution is a mixture of Gaussian models
Constant time filter: Domain transform filter [1] Guided image filter [2]
[1] K. He et al., ECCV 2010[2] E. Gastal and M. Oliveira, ACM ToG 2011
Noise variance
Lu Sheng, Thesis Oral Defense
![Page 97: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/97.jpg)
Gaussian Models for the Local Structures
Noise std
Lu Sheng, Thesis Oral Defense
![Page 98: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/98.jpg)
Online Update Scheme
Variational Parameter Estimation
Factorize the posterior into independent Gaussian and Dirichlet distributions
The reliability of the model
The probable depth is
The posterior can be approximated by
Recursive estimation is possible!
Lu Sheng, Thesis Oral Defense
![Page 99: RGB D Video Enhancement and Applications · Lu Sheng, Thesis Oral Defense. Structures in a Local Patch cloud object tower sky Lu Sheng, Thesis Oral Defense. Structures in a Local](https://reader030.vdocuments.net/reader030/viewer/2022040907/5e7df04f4f8fa64e901e879d/html5/thumbnails/99.jpg)
Online Update Scheme
Variational Parameter Estimation
Factorize the posterior into independent Gaussian and Dirichlet distributions
The posterior can be approximated by
Moment matching to estimate the hyperparameters
Closed-
form
solutions!
Lu Sheng, Thesis Oral Defense