it takes two to tango: cascading off-the-shelf face...

1
It Takes Two to Tango: Cascading Off-the-shelf Face Detectors Siqi Yang, Arnold Wiliem and Brian C. Lovell The University of Queensland, Australia The False Positive Problem The existing face detectors, including deep learning based methods, still generate false alarms (false positives) Face detection is an initial step of many facial analysis tasks, including facial landmark localisation and face recognition. The unexpected false positives will affect the accuracy and speed of the subsequent tasks Any method aimed to reduce false positives has the potential to improve all existing methods’ performance False positives True Positive Rate Detector 1 Cascade detector Figure 1: By cascading a second detector, a large number of false positives can be removed while the recall is well maintained. As a result, at a low number of false positives, the true positive rate can be increased significantly Cascading Off-the-shelf Face Detectors The effort to train a new face detection model is enormous Solution: We propose to cascade two pre-trained face detectors in a two-stage framework: The computational complexity of fusion-based detectors (placed in parallel) increase linearly according to the number of detectors and the overall running time is constrained by the slowest detector Two questions arise: 1 which two detectors should be cascaded? 2 which order should they be cascaded? Correlation and Diversity Metrics We define the correlation of overlapping true positives and diversity of overlapping false positives: c T 21 = |T o | |T 1 | , d F 21 =1 - |F o | |F 1 | . Detections from four different face detectors on the FDDB dataset [1]: (a) NPD [2], (b) HeadHunter [3], (c) MTCNN [4] and (d) HR [5]. Green: true positives, red: false positives The distribution of the overlapping true positives (left) and false positives (right) between four face detectors. Only a small number of false positives are detected by both detectors, whereas a majority of true positives overlap. Proposed Cascade Properties 1 A high correlation of TPs, c T 21 1, 2 A high diversity of FPs, d F 21 1, 3 Faster detector in the first stage to achieve an overall fast speed Conclusions We propose three essential cascade properties that guide us in determining the efficacy of the cascaded detector Experimental results show our framework is able to remove a large number of false positives with an insignificant loss of true positive rate We found a pair of face detectors that achieves significantly lower false positive rate with competitive detection rate, which is five times faster than the state-of-the-art detector described in [5] Experiments Runtime analysis on the FDDB dataset [1]: Method CPU time (SPF * ) TPR (FPPI # =0.1) 1st stage 2nd stage total time VJ [7] 0.271 - 0.271 0.462 NPD [2] 0.678 - 0.678 0.801 NPD-HeadHunter 0.678 988 988.678 0.810 NPD-MTCNN 0.678 0.073 0.751 0.841 NPD-HR 0.678 2.678 3.356 0.841 HeadHunter [3] 1961 - 1961 0.834 HeadHunter-NPD 1961 0.404 1961.404 0.819 HeadHunter-MTCNN 1961 0.116 1961.116 0.889 HeadHunter-HR 1961 3.648 1964.648 0.889 MTCNN [4] 0.355 - 0.355 0.919 MTCNN-NPD 0.355 0.220 0.575 0.843 MTCNN-HeadHunter 0.355 456 456.355 0.882 MTCNN-HR 0.355 3.496 3.851 0.930 HR [5] 17.687 - 17.687 0.943 HR-NPD 17.687 0.170 17.857 0.839 HR-HeadHunter 17.687 794 811.687 0.886 HR-MTCNN 17.687 0.076 17.763 0.930 * SPF–Seconds Per Frame # FPPI–False Positives Per Image, TPR–True Positive Rate Comparison of our proposed framework and the state-of-the-art face detector: Method CPU time (SPF * ) TPR (FPPI # =0.1) HR [5] 17.687 0.943 MTCNN-HR (ours) 3.851 0.930 * SPF–Seconds Per Frame # FPPI–False Positives Per Image, TPR–True Positive Rate Comparisons on FDDB dataset [1] (left) and WIDER FACE validation set [6] (right): 0 200 400 600 800 1000 Total False Positive 0.4 0.5 0.6 0.7 0.8 0.9 1 Detection Rate HR (0.94276) MTCNN-HR (0.92806) HR-MTCNN (0.9298) MTCNN (0.91936) HeadHunter-HR (0.88938) HR-HeadHunter (0.8861) HeadHunter-MTCNN (0.889) MTCNN-HeadHunter (0.88223) NPD-HR (0.84142) HR-NPD (0.8393) NPD-MTCNN (0.84142) MTCNN-NPD (0.84316) HeadHunter (0.83388) NPD-HeadHunter (0.81048) HeadHunter-NPD (0.8188) NPD (0.80101) VJ (0.46161) 0 0.2 0.4 0.6 0.8 1 Recall 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Precision HR-0.936 MTCNN-HR-0.884 MTCNN-0.832 HeadHunter-HR-0.807 NPD-HR-0.730 HeadHunter-0.712 HeadHunter-MTCNN-0.708 NPD-MTCNN-0.643 NPD-HeadHunter-0.614 NPD-0.608 0 0.2 0.4 0.6 0.8 1 Recall 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Precision HR-0.918 MTCNN-HR-0.850 MTCNN-0.808 HeadHunter-HR-0.696 HeadHunter-MTCNN-0.633 NPD-HR-0.621 HeadHunter-0.584 NPD-MTCNN-0.563 NPD-HeadHunter-0.508 NPD-0.501 0 0.2 0.4 0.6 0.8 1 Recall 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Precision HR-0.812 MTCNN-0.622 MTCNN-HR-0.593 HeadHunter-HR-0.337 HeadHunter-MTCNN-0.315 NPD-HR-0.304 NPD-MTCNN-0.282 HeadHunter-0.249 NPD-HeadHunter-0.232 NPD-0.228 [1] Vidit Jain and Erik G Learned-Miller. Fddb: A benchmark for face detection in unconstrained settings. UMass Amherst Technical Report, 2010. [2] Shengcai Liao, Anil K Jain, and Stan Z Li. A fast and accurate unconstrained face detector. PAMI, 38(2):211–223, 2016. [3] Markus Mathias, Rodrigo Benenson, Marco Pedersoli, and Luc Van Gool. Face detection without bells and whistles. In ECCV, 2014. [4] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10):1499–1503, 2016. [5] Peiyun Hu and Deva Ramanan. Finding tiny faces. In CVPR, 2017. [6] Shuo Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. Wider face: A face detection benchmark. In CVPR, 2015. [7] Paul Viola and Michael Jones. Rapid object detection using a boosted cascade of simple features. In CVPR, 2001. Acknowledgement: This work has been partly funded by Sullivan Nicolaides Pathology, Australia and the Australian Research Council (ARC) Linkage Projects Grant LP160101797. Arnold Wiliem is funded by the Advance Queensland Early-Career Research Fellowship. We thank Dr Shaokang Chen for the useful discussions.

Upload: others

Post on 09-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: It Takes Two to Tango: Cascading Off-the-shelf Face Detectorssiqi.vision/publications/cvprw18_poster.pdf · 2019-09-19 · Title: It Takes Two to Tango: Cascading Off-the-shelf Face

It Takes Two to Tango: Cascading Off-the-shelf Face DetectorsSiqi Yang, Arnold Wiliem and Brian C. Lovell

The University of Queensland, Australia

The False Positive Problem

•The existing face detectors, including deep learning basedmethods, still generate false alarms (false positives)

•Face detection is an initial step of many facial analysis tasks,including facial landmark localisation and face recognition. Theunexpected false positives will affect the accuracy and speed of thesubsequent tasks

•Any method aimed to reduce false positives has the potential toimprove all existing methods’ performance

False positives

True

Po

sitive R

ate

Detector 1Cascade detector

Figure 1: By cascading a second detector, a large number of false positives can beremoved while the recall is well maintained. As a result, at a low number of falsepositives, the true positive rate can be increased significantly

Cascading Off-the-shelf Face Detectors

•The effort to train a new face detection model is enormous•Solution: We propose to cascade two pre-trained face detectors ina two-stage framework:

•The computational complexity of fusion-based detectors (placed inparallel) increase linearly according to the number of detectors andthe overall running time is constrained by the slowest detector

•Two questions arise:1 which two detectors should be cascaded?2 which order should they be cascaded?

Correlation and Diversity Metrics

•We define the correlation of overlapping true positives anddiversity of overlapping false positives:

cT2→1 = |To|

|T1|,

dF2→1 = 1− |Fo|

|F1|.

•Detections from four different face detectors on the FDDBdataset [1]: (a) NPD [2], (b) HeadHunter [3], (c) MTCNN [4] and(d) HR [5]. Green: true positives, red: false positives

•The distribution of the overlapping true positives (left) and falsepositives (right) between four face detectors. Only a small numberof false positives are detected by both detectors, whereas amajority of true positives overlap.

Proposed Cascade Properties

1 A high correlation of TPs, cT2→1 ≈ 1,

2 A high diversity of FPs, dF2→1 ≈ 1,

3 Faster detector in the first stage to achieve an overall fast speed

Conclusions

•We propose three essential cascade properties that guide us in determining the efficacy of the cascaded detector•Experimental results show our framework is able to remove a large number of false positives with an insignificant loss of true positive rate•We found a pair of face detectors that achieves significantly lower false positive rate with competitive detection rate, which is five times faster thanthe state-of-the-art detector described in [5]

Experiments

•Runtime analysis on the FDDB dataset [1]:Method CPU time (SPF∗) TPR (FPPI#=0.1)

1st stage 2nd stage total timeVJ [7] 0.271 - 0.271 0.462NPD [2] 0.678 - 0.678 0.801NPD-HeadHunter 0.678 988 988.678 0.810NPD-MTCNN 0.678 0.073 0.751 0.841NPD-HR 0.678 2.678 3.356 0.841HeadHunter [3] 1961 - 1961 0.834HeadHunter-NPD 1961 0.404 1961.404 0.819HeadHunter-MTCNN 1961 0.116 1961.116 0.889HeadHunter-HR 1961 3.648 1964.648 0.889MTCNN [4] 0.355 - 0.355 0.919MTCNN-NPD 0.355 0.220 0.575 0.843MTCNN-HeadHunter 0.355 456 456.355 0.882MTCNN-HR 0.355 3.496 3.851 0.930HR [5] 17.687 - 17.687 0.943HR-NPD 17.687 0.170 17.857 0.839HR-HeadHunter 17.687 794 811.687 0.886HR-MTCNN 17.687 0.076 17.763 0.930∗SPF–Seconds Per Frame # FPPI–False Positives Per Image, TPR–True Positive Rate

•Comparison of our proposed framework and the state-of-the-artface detector:

Method CPU time (SPF∗) TPR (FPPI#=0.1)HR [5] 17.687 0.943MTCNN-HR (ours) 3.851 0.930∗SPF–Seconds Per Frame # FPPI–False Positives Per Image, TPR–True Positive Rate

•Comparisons on FDDB dataset [1] (left) and WIDER FACEvalidation set [6] (right):

0 200 400 600 800 1000

Total False Positive

0.4

0.5

0.6

0.7

0.8

0.9

1

De

tec

tio

n R

ate

HR (0.94276)

MTCNN-HR (0.92806)

HR-MTCNN (0.9298)

MTCNN (0.91936)

HeadHunter-HR (0.88938)

HR-HeadHunter (0.8861)

HeadHunter-MTCNN (0.889)

MTCNN-HeadHunter (0.88223)

NPD-HR (0.84142)

HR-NPD (0.8393)

NPD-MTCNN (0.84142)

MTCNN-NPD (0.84316)

HeadHunter (0.83388)

NPD-HeadHunter (0.81048)

HeadHunter-NPD (0.8188)

NPD (0.80101)

VJ (0.46161)

0 0.2 0.4 0.6 0.8 1

Recall

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Prec

isio

n

HR-0.936MTCNN-HR-0.884MTCNN-0.832HeadHunter-HR-0.807NPD-HR-0.730HeadHunter-0.712HeadHunter-MTCNN-0.708NPD-MTCNN-0.643NPD-HeadHunter-0.614NPD-0.608

0 0.2 0.4 0.6 0.8 1

Recall

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Prec

isio

n

HR-0.918MTCNN-HR-0.850MTCNN-0.808HeadHunter-HR-0.696HeadHunter-MTCNN-0.633NPD-HR-0.621HeadHunter-0.584NPD-MTCNN-0.563NPD-HeadHunter-0.508NPD-0.501

0 0.2 0.4 0.6 0.8 1

Recall

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Prec

isio

n

HR-0.812MTCNN-0.622MTCNN-HR-0.593HeadHunter-HR-0.337HeadHunter-MTCNN-0.315NPD-HR-0.304NPD-MTCNN-0.282HeadHunter-0.249NPD-HeadHunter-0.232NPD-0.228

[1] Vidit Jain and Erik G Learned-Miller. Fddb: A benchmark for face detection in unconstrained settings. UMass Amherst Technical Report, 2010.[2] Shengcai Liao, Anil K Jain, and Stan Z Li. A fast and accurate unconstrained face detector. PAMI, 38(2):211–223, 2016.[3] Markus Mathias, Rodrigo Benenson, Marco Pedersoli, and Luc Van Gool. Face detection without bells and whistles. In ECCV, 2014.[4] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. Joint face detection and alignment using multitask cascaded convolutional networks.

IEEE Signal Processing Letters, 23(10):1499–1503, 2016.[5] Peiyun Hu and Deva Ramanan. Finding tiny faces. In CVPR, 2017.[6] Shuo Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. Wider face: A face detection benchmark. In CVPR, 2015.[7] Paul Viola and Michael Jones. Rapid object detection using a boosted cascade of simple features. In CVPR, 2001.

Acknowledgement: This work has been partly funded by Sullivan Nicolaides Pathology, Australia and the Australian Research Council (ARC) Linkage Projects Grant LP160101797. Arnold Wiliem is funded by the Advance Queensland Early-Career Research Fellowship. We thank Dr Shaokang Chen for the useful discussions.