automatic portrait segmentation and...

Post on 16-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Automatic Portrait Segmentation and Matting

Xiaoyong ShenThe Chinese University of Hong Kong

goodshenxy@gmail.com

Research on CV

• Pixel based (low level/ early vision)• Filtering, restoration, denoise, enhancement, deblur,

editing, dehaze, etc.

• Region/ Patch based (Middle level vision)• Matching, optical flow, stereo matching, tracking,

segmentation, etc.

• Object/ Semantic based (high level vision)• Semantic segmentation, Object detection, image

classification, recognition, etc.

My Research on CV

• Pixel based (low level vision)• Filtering, restoration, denoise, enhancement, deblur,

editing, dehaze, etc.

• Region/ Patch based (Middle level vision)• Matching, optical flow, stereo matching, tracking,

segmentation, etc.

• Object based (high level vision)• Semantic segmentation, Object detection, image

classification, recognition, etc.

Multi-Spectral Image Restoration

• Input• Noisy RGB image I0

• E.g. captured at night

• Clean guidance image G• E.g. dark-flashed NIR, or flashed RGB images

• Output• Denoised image I

• Structures are clear as guidance G.• Appearance is the same as image I0.• Shadow/Highlight does not affect.

5[TPAMI 2015]

Scale Map

• Given 𝐼∗ – the expected ground truth noise-free image, our scale map s is defined under the following condition

min 𝛻𝐼∗ − 𝑠𝛻𝐺

• It adapts structures of 𝐺 to that of I*.

• It is an ideal ratio map between 𝛻𝐺 and 𝛻𝐼∗.

6

Result

7Our Result Ground Truth

Input Noisy Image Input NIR Image

RGB Input I

8

NIR Input G

9

BM3D

10

Our Result

11

Mutual-Structure Filter

[ICCV 2015 Oral Presentation]

Depth/RGB Restoration

Noisy Depth

Depth/RGB Restoration

Noisy RGB Image

Depth/RGB Restoration

Ground truth

Depth/RGB Restoration

OursPSNR = 37.19

Rolling Guidance Filter

One line code only: 𝐼𝑡+1 = 𝐽𝐹(𝐼0, 𝐼𝑡)

[ECCV 2014 Oral Presentation]

Texture Removal

18

Halftone Image

19

De-Filter

One line code only: 𝐼𝑡+1 = 𝐼𝑡 + (𝐼0 − 𝐹(𝐼𝑡))

Reverse Skin Retouch

Retouched input

Reverse Skin Retouch

Reversed

Reverse Skin Retouch

Before retouch

Multi-Spectral Matching

• Match general multi-spectral images with significant displacement and obvious structure inconsistency

Different Exposures RGB/Depth RGB/NIR Flash/No-flash

Result

• Match RGB/NIR image pair

InputsOur ResultBlended

Applications

• HDR construction

Without AlignmentWith AlignmentConstructed HDR

Internet Image Matching

Reference Input

Dense Correspondences ?

Exist Correspondence

No Correspondence

[SIGGRAPH ASIA 2016]

Our Motivation

Reference Input

Dense Correspondences ?

Foremost Region Matching

Time-lapse Generation

Automatic Morphing

Automatic Morphing

Object-based MatchingAchieve higher accuracy with the help of object (person)

Object-based Matching

State-of-the-art Ours

Classification and Segmentation

• Fine-grained Classification• DeepLAC (CVPR 2015)

• Text detection and recognition

• Semantic object segmentation• Portrait segmentation and matting

• VOC challenge

Automatic Portrait Segmentation

Motivation

• Abundant portraits in smartphone photos

38

Portrait, 30%

Others, 70%

Samsung UK

Portrait, 90%

Others, 10%

Symon Whitehorn from HTC

Portrait Post-processing

39

Foreground Selection

40

Quick Selection

41

Automatic Segmentation

42

Automatic?

Challenges

43

Similar Color Complex Background Various Accessories

Low Contrast Diverse PoseComplicated Edges

Possible Solutions

• Graph-cut with face tracker

44

Possible Solutions

• CNNs for semantic segmentation

45

Most Related Work

• Interactive Image Selection• Lazy snapping [Li et al. 2004]• Grabcut [Rother et al. 2004]• Paint Selection [Li et al. 2009]

• CNNs for Semantic Object Segmentation• FCN [Long et al. 2014]• DeepLab [Chen et al. 2014]• CRFasRNN [Zheng et al. 2015]

• Image Matting• Bayesian matting [Chuang et al. 2001]• Closed-form matting [Levin et al. 2008]• KNN matting [Chen et al. 2013]

46

Our Approach

47

PortraitFCN and PortraitFCN+

Our System

48

Detector

Conv ReLUPooling Conv

ConvPoolingReLU

DeConv Mask[Long et al. 2015]

PortraitFCN ModelRGB Channels 2 Outputs

PortraitFCN

49

• Fine tune it from original FCN-8s model

Portrait Knowledge

PortraitFCN+

50

Detector

Conv ReLUPooling Conv

ConvPoolingReLU

DeConv Mask[Long et al. 2015]

PortraitFCN+ ModelRGB+Shape+Position 2 Outputs

Shape Position

Shape Channel

51

……

Labeled Masks

Align

Canonical Pose

Mean

Shape Channel

𝑀 =σ𝑖𝑤𝑖 ∘ 𝑇𝑖(𝑀𝑖)

σ𝑖𝑤𝑖

Align

Test Image

Position Channel

52

Canonical Pose

x- Coordinate y- Coordinate

Position Test Image

Align

Effectiveness

53

Input

Effectiveness

54

PortraitFCN

Effectiveness

55

PortraitFCN+

Experiments and Applications

56

Our Dataset

• 1,800 portraits from Flickr with labeled mask• 1500 portraits as the training data

• 300 for testing

• Large variations on portrait types• Age, color, background, clothing, accessories, head

position, hair style, lighting, etc.

57

58

Training

• Fine turn the model starting from FCN-8s• Synthesize more data with different transforms

• Using the person class and background weights

• Find the best learning rate• Loss

• accuracy

59

Find the Best LR

60

Evaluation

61

Methods Mean IoU (%)

Graph-cut 80.02

FCN (Person Class) 73.09

IoU =area(output ∩ ground truth)

area(output ∪ ground truth)

Evaluation

62

Methods Mean IoU (%)

Graph-cut 80.02

FCN (Person Class) 73.09

PortraitFCN 94.20

IoU =area(output ∩ ground truth)

area(output ∪ ground truth)

Evaluation

63

Methods Mean IoU (%)

Graph-cut 80.02

FCN (Person Class) 73.09

PortraitFCN 94.20

PortraitFCN+ (Only with Mean Mask) 94.89

PortraitFCN+ (Only with Normalized x and y) 94.61

IoU =area(output ∩ ground truth)

area(output ∪ ground truth)

Evaluation

64

Methods Mean IoU (%)

Graph-cut 80.02

FCN (Person Class) 73.09

PortraitFCN 94.20

PortraitFCN+ (Only with Mean Mask) 94.89

PortraitFCN+ (Only with Normalized x and y) 94.61

PortraitFCN+ 95.91

IoU =area(output ∩ ground truth)

area(output ∪ ground truth)

Comparisons

65

Input

Comparisons

66

Ground Truth

Comparisons

67

Graph-cut

Comparisons

68

FCN-8s (Person)

Comparisons

69

PortraitFCN

Comparisons

70

PortraitFCN+

Comparisons

71

Input Ground Truth

IoU = 0.83 IoU = 0.42

IoU = 0.91 IoU = 0.85

FCN-8s Graph-cut

IoU = 0.99

IoU = 0.98

Ours

Comparisons

72

Input Ground Truth

IoU = 0.77 IoU = 0.95

IoU = 0.38 IoU = 0.84

FCN-8s Graph-cut

IoU = 0.98

IoU = 0.98

Ours

Comparisons

73

Input Ground Truth

IoU = 0.83 IoU = 0.53

IoU = 0.81 IoU = 0.89

FCN-8s Graph-cut

IoU = 0.99

IoU = 0.98

Ours

Robustness

74

Color Scale Rotation Occlusion

User Study

• Our result provides very good initialization for further refinement

75

Segmentation is not enough--Automatic Portrait Matting

Portrait Matting

Input Image Alpha Matte

Color transform Depth-of-field Portrait

Stylization Cartoon

Background Edit

Problem Definition

78

𝜶𝑭 + 𝟏 − 𝜶 𝑩

foreground background

Image Alpha/foreground opacity

𝑰 =

Natural Image Matting

• Color Sampling Methods• Given manual-labeled trimap

• Bayesian Matting [Y-Y Chuang, 2001], etc.

79

Image Trimap Alpha matte

Natural Image Matting

• Propagation approaches• Given manual-labeled strokes & trimap

• Closed-form Matting [Levin, 2008], etc.

80

𝛼 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝛼𝑇𝐿𝛼 + 𝜆 𝛼 − 𝑏𝑠𝑇𝐷(𝛼 − 𝑏𝑠)

Matting Laplacian User-provided Strokes

Diagonal stroke mask

Motivation

• It is very hard to specify trimap or strokes

81

Input Labeled Strokes Closed-form Matting

error

Motivation

• It is very hard to specify trimap or strokes

Input Labeled Trimap Closed-form Matting

error

Motivation

83

Usually we need to refine the trimap many times to get a good alpha matte……

Segmentation to Matting

Segmentation to Matting

86

Learning for Automatic Matting

• Challenges• Data preparation

• Learning framework

• We propose end-to-end Convolutional Neural Networks (CNNs) for Portrait Matting

87

Learning Data Collection

• 2000 portraits from Flickr with large variation• Keywords…

• Different Age, gender, pose, hairstyle, background…

• Different camera type…

• Data example

88

8989

Data Labeling

• Apply closed-form matting and robust matting• Gradually refine the input trimap

• Choose the best one from closed-form or robust matting

• User interface

• Ground truth example

90

9191

Learn Automatic Matting

92

Our Method

93

Trimap labeling• Input: RGB image

• Output: trimap

• Network: Fine tuned from FCN

Our Method

94

Image Matting Layer• Input: trimap

• Output: alpha matte

• Novel-designed structure

Our Method

95

Image Matting Layer• Feed-Forward:

𝑚𝑖𝑛 𝜆𝐴𝑇𝐵𝐴 + 𝜆 𝐴 − 1 𝑇𝐹(𝐴 − 1) + 𝐴𝑇𝐿𝐴• Back-Forward:

𝜕𝑓

𝜕𝐵= −𝜆𝐷−1𝑑𝑖𝑎𝑔(𝐷−1𝐹)

𝜕𝑓

𝜕𝐹=𝜕𝑓

𝜕𝐵+ 𝐷−1

𝜕𝑓

𝜕𝜆= −𝜆𝐷−1𝑑𝑖𝑎𝑔 𝐹 + 𝐵 𝐷−1𝐹

Our Method

96

Image Matting Layer• Loss function:

𝐿(𝐴, 𝐴𝑔𝑡) =

𝑖

𝑤 𝐴𝑖𝑔𝑡

| 𝐴𝑖 − 𝐴𝑖𝑔𝑡

|,

𝑤 𝐴𝑖𝑔𝑡

= −𝑙𝑜𝑔(𝑝(𝐴 = 𝐴𝑖𝑔𝑡))

Model Training

97

• Data augmentation• 4 scales {0.6,0.8,1.2,1.5}

• 4 rotations {-45,-22,22,45} degree

• Gamma value {0.5,0.8,1.2,1.5}

• Network initialization• Fine tuned from FCN-8s Model [J. Long, 2015]

Experiments

98

• Running Time• Training time: 20k iterations, one day on Titan X GPU

• Testing Time: 0.6s for 600×800 color image.

• Comparisons• Graph-cut

• FCN Baseline: direct FCN segmentation followed by closed-form matting

Results

99

Input Graph-cut FCN Ours

Results

100

Input Graph-cut FCN Ours

Results

101

Input Graph-cut FCN Ours

Results

102

Input Graph-cut FCN Ours

Failure Cases

103

Input Alpha Matte Input Alpha Matte

Applications

104

Input Stylization PS GS Stick PS Fresco Stylization

Input Stylization Depth-of-Field PS Fresco Stylization

Applications

105

Input Stylization PS Palette Knife PS GS Stick PS Sketch

Input PS Oil Paint Depth-of-Field PS GS Stick Stylization

Applications

106

Input Stylization PS Palette Knife Depth-of-Field Stylization

Input Stylization PS Palette Knife PS Dark Stroke PS Paint Daubs

Conclusions

• High accuracy automatic portrait segmentation and matting approach• A novel CNN framework• Training and testing dataset• Benefits lots of applications

• Future work• Video segmentation• Human segmentation• Single portrait image depth estimation• Weakly supervised version

107

Q & A

108

Thanks

top related