it takes two to tango: cascading off-the-shelf face...
TRANSCRIPT
It Takes Two to Tango: Cascading Off-the-shelf Face DetectorsSiqi Yang, Arnold Wiliem and Brian C. Lovell
The University of Queensland, Australia
The False Positive Problem
•The existing face detectors, including deep learning basedmethods, still generate false alarms (false positives)
•Face detection is an initial step of many facial analysis tasks,including facial landmark localisation and face recognition. Theunexpected false positives will affect the accuracy and speed of thesubsequent tasks
•Any method aimed to reduce false positives has the potential toimprove all existing methods’ performance
False positives
True
Po
sitive R
ate
Detector 1Cascade detector
Figure 1: By cascading a second detector, a large number of false positives can beremoved while the recall is well maintained. As a result, at a low number of falsepositives, the true positive rate can be increased significantly
Cascading Off-the-shelf Face Detectors
•The effort to train a new face detection model is enormous•Solution: We propose to cascade two pre-trained face detectors ina two-stage framework:
•The computational complexity of fusion-based detectors (placed inparallel) increase linearly according to the number of detectors andthe overall running time is constrained by the slowest detector
•Two questions arise:1 which two detectors should be cascaded?2 which order should they be cascaded?
Correlation and Diversity Metrics
•We define the correlation of overlapping true positives anddiversity of overlapping false positives:
cT2→1 = |To|
|T1|,
dF2→1 = 1− |Fo|
|F1|.
•Detections from four different face detectors on the FDDBdataset [1]: (a) NPD [2], (b) HeadHunter [3], (c) MTCNN [4] and(d) HR [5]. Green: true positives, red: false positives
•The distribution of the overlapping true positives (left) and falsepositives (right) between four face detectors. Only a small numberof false positives are detected by both detectors, whereas amajority of true positives overlap.
Proposed Cascade Properties
1 A high correlation of TPs, cT2→1 ≈ 1,
2 A high diversity of FPs, dF2→1 ≈ 1,
3 Faster detector in the first stage to achieve an overall fast speed
Conclusions
•We propose three essential cascade properties that guide us in determining the efficacy of the cascaded detector•Experimental results show our framework is able to remove a large number of false positives with an insignificant loss of true positive rate•We found a pair of face detectors that achieves significantly lower false positive rate with competitive detection rate, which is five times faster thanthe state-of-the-art detector described in [5]
Experiments
•Runtime analysis on the FDDB dataset [1]:Method CPU time (SPF∗) TPR (FPPI#=0.1)
1st stage 2nd stage total timeVJ [7] 0.271 - 0.271 0.462NPD [2] 0.678 - 0.678 0.801NPD-HeadHunter 0.678 988 988.678 0.810NPD-MTCNN 0.678 0.073 0.751 0.841NPD-HR 0.678 2.678 3.356 0.841HeadHunter [3] 1961 - 1961 0.834HeadHunter-NPD 1961 0.404 1961.404 0.819HeadHunter-MTCNN 1961 0.116 1961.116 0.889HeadHunter-HR 1961 3.648 1964.648 0.889MTCNN [4] 0.355 - 0.355 0.919MTCNN-NPD 0.355 0.220 0.575 0.843MTCNN-HeadHunter 0.355 456 456.355 0.882MTCNN-HR 0.355 3.496 3.851 0.930HR [5] 17.687 - 17.687 0.943HR-NPD 17.687 0.170 17.857 0.839HR-HeadHunter 17.687 794 811.687 0.886HR-MTCNN 17.687 0.076 17.763 0.930∗SPF–Seconds Per Frame # FPPI–False Positives Per Image, TPR–True Positive Rate
•Comparison of our proposed framework and the state-of-the-artface detector:
Method CPU time (SPF∗) TPR (FPPI#=0.1)HR [5] 17.687 0.943MTCNN-HR (ours) 3.851 0.930∗SPF–Seconds Per Frame # FPPI–False Positives Per Image, TPR–True Positive Rate
•Comparisons on FDDB dataset [1] (left) and WIDER FACEvalidation set [6] (right):
0 200 400 600 800 1000
Total False Positive
0.4
0.5
0.6
0.7
0.8
0.9
1
De
tec
tio
n R
ate
HR (0.94276)
MTCNN-HR (0.92806)
HR-MTCNN (0.9298)
MTCNN (0.91936)
HeadHunter-HR (0.88938)
HR-HeadHunter (0.8861)
HeadHunter-MTCNN (0.889)
MTCNN-HeadHunter (0.88223)
NPD-HR (0.84142)
HR-NPD (0.8393)
NPD-MTCNN (0.84142)
MTCNN-NPD (0.84316)
HeadHunter (0.83388)
NPD-HeadHunter (0.81048)
HeadHunter-NPD (0.8188)
NPD (0.80101)
VJ (0.46161)
0 0.2 0.4 0.6 0.8 1
Recall
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Prec
isio
n
HR-0.936MTCNN-HR-0.884MTCNN-0.832HeadHunter-HR-0.807NPD-HR-0.730HeadHunter-0.712HeadHunter-MTCNN-0.708NPD-MTCNN-0.643NPD-HeadHunter-0.614NPD-0.608
0 0.2 0.4 0.6 0.8 1
Recall
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Prec
isio
n
HR-0.918MTCNN-HR-0.850MTCNN-0.808HeadHunter-HR-0.696HeadHunter-MTCNN-0.633NPD-HR-0.621HeadHunter-0.584NPD-MTCNN-0.563NPD-HeadHunter-0.508NPD-0.501
0 0.2 0.4 0.6 0.8 1
Recall
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Prec
isio
n
HR-0.812MTCNN-0.622MTCNN-HR-0.593HeadHunter-HR-0.337HeadHunter-MTCNN-0.315NPD-HR-0.304NPD-MTCNN-0.282HeadHunter-0.249NPD-HeadHunter-0.232NPD-0.228
[1] Vidit Jain and Erik G Learned-Miller. Fddb: A benchmark for face detection in unconstrained settings. UMass Amherst Technical Report, 2010.[2] Shengcai Liao, Anil K Jain, and Stan Z Li. A fast and accurate unconstrained face detector. PAMI, 38(2):211–223, 2016.[3] Markus Mathias, Rodrigo Benenson, Marco Pedersoli, and Luc Van Gool. Face detection without bells and whistles. In ECCV, 2014.[4] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. Joint face detection and alignment using multitask cascaded convolutional networks.
IEEE Signal Processing Letters, 23(10):1499–1503, 2016.[5] Peiyun Hu and Deva Ramanan. Finding tiny faces. In CVPR, 2017.[6] Shuo Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. Wider face: A face detection benchmark. In CVPR, 2015.[7] Paul Viola and Michael Jones. Rapid object detection using a boosted cascade of simple features. In CVPR, 2001.
Acknowledgement: This work has been partly funded by Sullivan Nicolaides Pathology, Australia and the Australian Research Council (ARC) Linkage Projects Grant LP160101797. Arnold Wiliem is funded by the Advance Queensland Early-Career Research Fellowship. We thank Dr Shaokang Chen for the useful discussions.