automatic portrait segmentation and...

Automatic Portrait Segmentation and Matting

Xiaoyong ShenThe Chinese University of Hong Kong

goodshenxy@gmail.com

Research on CV

• Pixel based (low level/ early vision)• Filtering, restoration, denoise, enhancement, deblur,

editing, dehaze, etc.

• Region/ Patch based (Middle level vision)• Matching, optical flow, stereo matching, tracking,

segmentation, etc.

• Object/ Semantic based (high level vision)• Semantic segmentation, Object detection, image

classification, recognition, etc.

My Research on CV

• Pixel based (low level vision)• Filtering, restoration, denoise, enhancement, deblur,

editing, dehaze, etc.

• Region/ Patch based (Middle level vision)• Matching, optical flow, stereo matching, tracking,

segmentation, etc.

• Object based (high level vision)• Semantic segmentation, Object detection, image

classification, recognition, etc.

Multi-Spectral Image Restoration

• Input• Noisy RGB image I0

• E.g. captured at night

• Clean guidance image G• E.g. dark-flashed NIR, or flashed RGB images

• Output• Denoised image I

• Structures are clear as guidance G.• Appearance is the same as image I0.• Shadow/Highlight does not affect.

5[TPAMI 2015]

Scale Map

• Given 𝐼∗ – the expected ground truth noise-free image, our scale map s is defined under the following condition

min 𝛻𝐼∗ − 𝑠𝛻𝐺

• It adapts structures of 𝐺 to that of I*.

• It is an ideal ratio map between 𝛻𝐺 and 𝛻𝐼∗.

Result

7Our Result Ground Truth

Input Noisy Image Input NIR Image

RGB Input I

NIR Input G

Our Result

Mutual-Structure Filter

[ICCV 2015 Oral Presentation]

Depth/RGB Restoration

Noisy Depth

Noisy RGB Image

Ground truth

OursPSNR = 37.19

Rolling Guidance Filter

One line code only: 𝐼𝑡+1 = 𝐽𝐹(𝐼0, 𝐼𝑡)

[ECCV 2014 Oral Presentation]

Texture Removal

Halftone Image

De-Filter

One line code only: 𝐼𝑡+1 = 𝐼𝑡 + (𝐼0 − 𝐹(𝐼𝑡))

Reverse Skin Retouch

Retouched input

Reversed

Before retouch

Multi-Spectral Matching

• Match general multi-spectral images with significant displacement and obvious structure inconsistency

Different Exposures RGB/Depth RGB/NIR Flash/No-flash

Result

• Match RGB/NIR image pair

InputsOur ResultBlended

Applications

• HDR construction

Without AlignmentWith AlignmentConstructed HDR

Internet Image Matching

Reference Input

Dense Correspondences ?

Exist Correspondence

No Correspondence

[SIGGRAPH ASIA 2016]

Our Motivation

Reference Input

Dense Correspondences ?

Foremost Region Matching

Time-lapse Generation

Automatic Morphing

Object-based MatchingAchieve higher accuracy with the help of object (person)

Object-based Matching

State-of-the-art Ours

Classification and Segmentation

• Fine-grained Classification• DeepLAC (CVPR 2015)

• Text detection and recognition

• Semantic object segmentation• Portrait segmentation and matting

• VOC challenge

Automatic Portrait Segmentation

Motivation

• Abundant portraits in smartphone photos

Portrait, 30%

Others, 70%

Samsung UK

Portrait, 90%

Others, 10%

Symon Whitehorn from HTC

Portrait Post-processing

Foreground Selection

Quick Selection

Automatic Segmentation

Automatic?

Challenges

Similar Color Complex Background Various Accessories

Low Contrast Diverse PoseComplicated Edges

Possible Solutions

• Graph-cut with face tracker

Possible Solutions

• CNNs for semantic segmentation

Most Related Work

• Interactive Image Selection• Lazy snapping [Li et al. 2004]• Grabcut [Rother et al. 2004]• Paint Selection [Li et al. 2009]

• CNNs for Semantic Object Segmentation• FCN [Long et al. 2014]• DeepLab [Chen et al. 2014]• CRFasRNN [Zheng et al. 2015]

• Image Matting• Bayesian matting [Chuang et al. 2001]• Closed-form matting [Levin et al. 2008]• KNN matting [Chen et al. 2013]

Our Approach

PortraitFCN and PortraitFCN+

Our System

Detector

Conv ReLUPooling Conv

ConvPoolingReLU

DeConv Mask[Long et al. 2015]

PortraitFCN ModelRGB Channels 2 Outputs

PortraitFCN

• Fine tune it from original FCN-8s model

Portrait Knowledge

PortraitFCN+

Detector

Conv ReLUPooling Conv

ConvPoolingReLU

DeConv Mask[Long et al. 2015]

PortraitFCN+ ModelRGB+Shape+Position 2 Outputs

Shape Position

Shape Channel

……

Labeled Masks

Canonical Pose

Shape Channel

𝑀 =σ𝑖𝑤𝑖 ∘ 𝑇𝑖(𝑀𝑖)

σ𝑖𝑤𝑖

Test Image

Position Channel

Canonical Pose

x- Coordinate y- Coordinate

Position Test Image

Effectiveness

PortraitFCN

Effectiveness

PortraitFCN+

Experiments and Applications

Our Dataset

• 1,800 portraits from Flickr with labeled mask• 1500 portraits as the training data

• 300 for testing

• Large variations on portrait types• Age, color, background, clothing, accessories, head

position, hair style, lighting, etc.

Training

• Fine turn the model starting from FCN-8s• Synthesize more data with different transforms

• Using the person class and background weights

• Find the best learning rate• Loss

• accuracy

Find the Best LR

Evaluation

Methods Mean IoU (%)

Graph-cut 80.02

FCN (Person Class) 73.09

IoU =area(output ∩ ground truth)

area(output ∪ ground truth)

Evaluation

Graph-cut 80.02

PortraitFCN 94.20

Evaluation

Graph-cut 80.02

PortraitFCN 94.20

PortraitFCN+ (Only with Mean Mask) 94.89

PortraitFCN+ (Only with Normalized x and y) 94.61

Evaluation

Graph-cut 80.02

PortraitFCN 94.20

PortraitFCN+ (Only with Mean Mask) 94.89

PortraitFCN+ (Only with Normalized x and y) 94.61

PortraitFCN+ 95.91

Comparisons

Ground Truth

Comparisons

Graph-cut

Comparisons

FCN-8s (Person)

Comparisons

PortraitFCN

Comparisons

PortraitFCN+

Comparisons

Input Ground Truth

IoU = 0.83 IoU = 0.42

IoU = 0.91 IoU = 0.85

FCN-8s Graph-cut

IoU = 0.99

IoU = 0.98

Comparisons

Input Ground Truth

IoU = 0.77 IoU = 0.95

IoU = 0.38 IoU = 0.84

FCN-8s Graph-cut

IoU = 0.98

Comparisons

Input Ground Truth

IoU = 0.83 IoU = 0.53

IoU = 0.81 IoU = 0.89

FCN-8s Graph-cut

IoU = 0.99

IoU = 0.98

Robustness

Color Scale Rotation Occlusion

User Study

• Our result provides very good initialization for further refinement

Segmentation is not enough--Automatic Portrait Matting

Portrait Matting

Input Image Alpha Matte

Color transform Depth-of-field Portrait

Stylization Cartoon

Background Edit

Problem Definition

𝜶𝑭 + 𝟏 − 𝜶 𝑩

foreground background

Image Alpha/foreground opacity

𝑰 =

Natural Image Matting

• Color Sampling Methods• Given manual-labeled trimap

• Bayesian Matting [Y-Y Chuang, 2001], etc.

Image Trimap Alpha matte

Natural Image Matting

• Propagation approaches• Given manual-labeled strokes & trimap

• Closed-form Matting [Levin, 2008], etc.

𝛼 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝛼𝑇𝐿𝛼 + 𝜆 𝛼 − 𝑏𝑠𝑇𝐷(𝛼 − 𝑏𝑠)

Matting Laplacian User-provided Strokes

Diagonal stroke mask

Motivation

• It is very hard to specify trimap or strokes

Input Labeled Strokes Closed-form Matting

Motivation

• It is very hard to specify trimap or strokes

Input Labeled Trimap Closed-form Matting

Motivation

Usually we need to refine the trimap many times to get a good alpha matte……

Segmentation to Matting

Learning for Automatic Matting

• Challenges• Data preparation

• Learning framework

• We propose end-to-end Convolutional Neural Networks (CNNs) for Portrait Matting

Learning Data Collection

• 2000 portraits from Flickr with large variation• Keywords…

• Different Age, gender, pose, hairstyle, background…

• Different camera type…

• Data example

Data Labeling

• Apply closed-form matting and robust matting• Gradually refine the input trimap

• Choose the best one from closed-form or robust matting

• User interface

• Ground truth example

Learn Automatic Matting

Our Method

Trimap labeling• Input: RGB image

• Output: trimap

• Network: Fine tuned from FCN

Our Method

Image Matting Layer• Input: trimap

• Output: alpha matte

• Novel-designed structure

Our Method

Image Matting Layer• Feed-Forward:

𝑚𝑖𝑛 𝜆𝐴𝑇𝐵𝐴 + 𝜆 𝐴 − 1 𝑇𝐹(𝐴 − 1) + 𝐴𝑇𝐿𝐴• Back-Forward:

𝜕𝑓

𝜕𝐵= −𝜆𝐷−1𝑑𝑖𝑎𝑔(𝐷−1𝐹)

𝜕𝑓

𝜕𝐹=𝜕𝑓

𝜕𝐵+ 𝐷−1

𝜕𝑓

𝜕𝜆= −𝜆𝐷−1𝑑𝑖𝑎𝑔 𝐹 + 𝐵 𝐷−1𝐹

Our Method

Image Matting Layer• Loss function:

𝐿(𝐴, 𝐴𝑔𝑡) =

𝑤 𝐴𝑖𝑔𝑡

| 𝐴𝑖 − 𝐴𝑖𝑔𝑡

𝑤 𝐴𝑖𝑔𝑡

= −𝑙𝑜𝑔(𝑝(𝐴 = 𝐴𝑖𝑔𝑡))

Model Training

• Data augmentation• 4 scales {0.6,0.8,1.2,1.5}

• 4 rotations {-45,-22,22,45} degree

• Gamma value {0.5,0.8,1.2,1.5}

• Network initialization• Fine tuned from FCN-8s Model [J. Long, 2015]

Experiments

• Running Time• Training time: 20k iterations, one day on Titan X GPU

• Testing Time: 0.6s for 600×800 color image.

• Comparisons• Graph-cut

• FCN Baseline: direct FCN segmentation followed by closed-form matting

Results

Input Graph-cut FCN Ours

Results

Failure Cases

Input Alpha Matte Input Alpha Matte

Applications

Input Stylization PS GS Stick PS Fresco Stylization

Input Stylization Depth-of-Field PS Fresco Stylization

Applications

Input Stylization PS Palette Knife PS GS Stick PS Sketch

Input PS Oil Paint Depth-of-Field PS GS Stick Stylization

Applications

Input Stylization PS Palette Knife Depth-of-Field Stylization

Input Stylization PS Palette Knife PS Dark Stroke PS Paint Daubs

Conclusions

• High accuracy automatic portrait segmentation and matting approach• A novel CNN framework• Training and testing dataset• Benefits lots of applications

• Future work• Video segmentation• Human segmentation• Single portrait image depth estimation• Weakly supervised version

Thanks

automatic portrait segmentation and...

Documents

china noisy

florian schroff, antonio criminisi & andrew zisserman iccv...

mcmc tutorial at iccv - yale university

noisy ad nausea?

branch and bound in rotation space (iccv 2007)

response cluster update 20160817 0600h

descriere practica iccv

noisy room

iccv 2019 expo...

iccv 2021 prizes

noisy numbers

20160817 mj teamsucces creeren

noisy studio

การประชุมวิสามัญผู้ถือหุ้นwice.listedcompany.com/misc/presentation/20160817... ·...

diario resumen 20160817

the noisy food.docx

the noisy archives

noisy or gate

bananatag for internal comms 20160817

topic models for scene analysis and abnormality...