deep convolutional neural networks for face and iris ...posed in literature for both face and iris...

This paper is a preprint of a paper accepted by IET Biometrics and is subject to Institution of Engineering and TechnologyCopyright

Deep convolutional neural networks for face and irispresentation attack detection Survey and case studyYomna Safaa El-Din1lowast Mohamed N Moustafa2 Hani Mahdi11Department of Computer and Systems Engineering Ain Shams University Cairo Egypt2Department of Computer Science and Engineering The American University in Cairo New Cairo Egypt E-mail yomnasafaa-eldinengasuedueg

Abstract Biometric presentation attack detection is gaining increasing attention Users of mobile devices find it more convenientto unlock their smart applications with finger face or iris recognition instead of passwords In this paper we survey the approachespresented in the recent literature to detect face and iris presentation attacks Specifically we investigate the effectiveness offine tuning very deep convolutional neural networks to the task of face and iris antispoofing We compare two different finetuning approaches on six publicly available benchmark datasets Results show the effectiveness of these deep models in learningdiscriminative features that can tell apart real from fake biometric images with very low error rate Cross-dataset evaluation on facePAD showed better generalization than state of the art We also performed cross-dataset testing on iris PAD datasets in terms ofequal error rate which was not reported in literature before Additionally we propose the use of a single deep network trained todetect both face and iris attacks We have not noticed accuracy degradation compared to networks trained for only one biometricseparately Finally we analyzed the learned features by the network in correlation with the image frequency components to justifyits prediction decision

1 Introduction

Biometric recognition has been increasingly used in recent appli-cation that need authentication and verification Instead of normalusername and password or token-based authentication the use ofbiometric traits like face iris or fingerprints are more convenient tothe users and so has gained a lot of popularity specially after thewide spread of smartphones and its usage in a lot of areas such aspayment However with the increase of technology spoofing thesebiometric traits has become more easy For example an attacker canuse a photograph or a video replay of the face or iris or a personto gain access to hisher smartphone or any application that needsauthorization Such act is referred to by the ISOIEC 30107-12016standard as presentation attack (PA) and the biometric or object usedin PA is called an artifact or presentation attack instrument (PAI)This has lead to the increase of interest in designing algorithms thatguard against these attacks and so are called presentation attackdetection (PAD) algorithms

Lots of research has been done in this area starting with methodsthat basically depend on designing and extracting hand-crafted fea-tures from the acquired images then using conventional classifiersto detect bona-fide (real) from attack presentations Literature showsthat methods relying on manually engineered features are suitablefor solving the PAD problem for face and iris recognition systemsHowever the design and selection of handcrafted feature extractorsis mainly based on expert knowledge of researchers on the problemConsequently these features often reflect limited aspects of the prob-lem and are often sensitive to varying acquisition conditions suchas camera devices lighting conditions and PAIs This causes theirdetection accuracy to vary significantly among different databasesindicating that the handcrafted features have poor generalizabilityand so do not completely solve the PAD problem

In recent years deep learning has evolved and the use of deepneural networks or convolutional neural networks (CNN) has provento be effective in many computer vision tasks especially with theavailability of new advanced hardware and large data CNNs havebeen successfully used for vision problems like image classifica-tion and object detection This has encouraged many researchers toincorporate deep learning in the PAD problem and rely on a deepnetwork or CNN to learn features that discriminate between bona-fide and attack face [1 2] or iris [3 4] or both [5] instead of usinghand-crafted features Several of these researches used CNN only for

feature extraction followed by a classic classifier like SVM othersdesigned custom networks or combined information from CNN andhand-crafted features

Several surveys for presentation attack detections in either face oriris recognition are available in literature Czajka and Bowyer pro-vided a thorough assessment of the state-of-the-art for iris PAD in [6]and concluded that PAD for iris recognition is not a solved problemyet Their main focus was on iris presentation attack categories theircountermeasures competitions and applications

For PAD in face recognition systems Raghavendra and Bush pro-vided a comprehensive survey in [7] describing different types ofpresentation attack and face artifacts and showing the vulnerabilityof commercial face recognition systems to presentation attack Theysurvey and evaluate fourteen state-of-the-art face PAD algorithms onCASIA face-spoofing database and discuss the remaining challengesand open issues for a robust face presentation attack detection sys-tem In [8] Ghaffar and Mohd focused their review on face PADsystems on smartphones with recent face datasets captured withmobile devices

Later Li et al [9] provided a comprehensive review of morerecent PAD algorithms in face recognition discussing the avail-able datasets and reported results They proposed a color-LBP basedcountermeasure as a case study and evaluated it on two benchmarkface datasets highlighting the importance of cross-dataset testing

In this work we survey most of the recent PAD approaches pro-posed in literature for both face and iris recognition applicationsalong with the competitions and benchmark datasets Then for acase study we assess the performance of recent well-known deepCNN architectures on the task of face and iris presentation attackdetection

For assessment we use 3 different face datasets and 3 iris datasetsUsing just the RGB images as input we finetune the deep networkrsquosweights then perform intra and cross-dataset evaluation to discussthe generalization ability of these deep models The experimentsare first done with separate models trained for each of face and irisbiometric then we experiment with training a single network to gen-erally tell apart bona-fide from attack samples whether the sampleis a face or an iris image Such generic PAD network could per-fectly differentiate between the real and attack images The specificfeatures learned by this network are analyzed showing that thesefeatures are highly related to high frequency regions in the inputimages We finally use gradient-weighted class activation maps [10]

1 ccopy This paper is a preprint of a paper accepted by IET Biometrics and is subject to Institution of Engineering and Technology Copyright

arX

iv2

004

1204

0v2

[cs

CV

] 2

9 A

pr 2

020

to visualize what the network focuses on when deciding if a sampleis bona-fide or not

Hence the main contributions of this work include (1) Surveythe recent methods for presentation attack detection in both face andiris modalities with competitions and datasets (2) Demonstrate theeffectiveness of CNN for the PAD problem by finetuning and com-paring recent deep CNN architectures on 3 benchmark face datasetsand 3 benchmark iris datasets experimenting with two finetuningapproaches and applying cross-dataset evaluation (3) Train a sin-gle network for presentation attack detection of the two biometricmodalities and compare the results with networks trained on singlebiometric (4) Analyze specific features learned by the commonly-trained network and (5) Finally visualize the class activation mapsof the trained network on sample bona-fide and attack images

The rest of the paper is organized as follow we first reviewthe PAD approaches which can broadly be categorized into active(Section 2) vs passive (Section 3) PAD techniques Then inSection 4 we summarize the face and iris PAD competitions andbenchmark databases followed by Section 5 where we explainthe proposed PAD approach Experiments results and analysis arepresented in Section 6 and finally conclusion and future work inSection 7

2 Active PAD

Active Presentation Attack Detection can either be hardware-basedor depending on challenge-response both will be highlighted indetails in this section

21 Hardware-based

Such methods depend on capturing the automatic response of thebiometric trait (faceiris) to a certain stimulus For example in irisliveness detection the use of eye hippus which is the permanentoscillation that the eye pupil presents even under uniform lightingconditions or the dilation of pupil in reaction to sudden illuminationchanges [11ndash15] Variations of pupil dilation were calculated byBodade et al [16] from multiple iris images while Huang et al [17]used pupil constriction to detect iris liveness Other researchers usedthe iris image brightness variations after light stimuli in some prede-fined iris regions [18] Or the reflections of infrared light on the moistcornea when stimulated with light sources positioned randomly inspace Lee et al [19] used additional infrared sensors during irisimage acquisition to construct the full 3d shape of the iris to aid inthe detection of fake iris images or contact lenses These approachesrely on an external specific device to be added to the sensorcamerahence the naming hardware-based techniques However not allhardware-based methods are active some such approaches detectfacial thermogram [20 21] blood pressure fingerprint sweat gaitetc passively in a non-intrusive manner

22 Challenge-response

In these methods the user is required to respond to a challengeinstructed by the system The system requests the user to performsome specific movement like blink right eye or rotate the headclockwise [22ndash24] or gaze towards a predefined stimulus [25]

23 Drawbacks of active PAD methods

Although active spoof detection methods may provide higher pre-sentation attack detection rates and robustness against differentpresentation attack types they are more expensive and less con-venient to users than the software-based approaches which do notinclude any additional devices beside the standard camera For therest of this survey we will focus on software-based methods theyare more generic less expensive less intrusive and can easily beincorporated in real-world applications and smartphones Table 1summarizes the different approaches used in literature for detectionof presentation attacks

3 Passive Software-based PAD

Passive software-based antispoofing methods can be categorizedinto several groups Temporal or motion-analysis based approacheswhich utilize features extracted from consecutive frames capturedfor the face or iris Or texture-based methods that use single imageof the biometric trait and does not depend on motion features Suchtexture-based PAD methods can further be classified into either agroup that depends on hand-crafted features vs another group ofrecent methods that opt to learn these discriminative features

31 Liveness-detection

The earliest category include approaches that identify whether thepresented biometric is live or an artifact such approaches are knownas liveness-detection An artifact can be a mask or paper in caseof face spoofing and paper or a glass-eye for iris These methodsdepend on detecting signs of life in captured biometric data Exam-ples in the case of face PAD are eye blinking [24 26ndash29] facialexpression changes head andor mouth movements [30ndash32] Forthe iris PAD Komogortsev et al [33] extracted liveness cues fromeye movement in a simulated scenario of an attack by a mechani-cal replica of the human eye The authors further investigated in thisline and published more iris liveness detection methods based oneye movements and gaze estimation in later years [34ndash36]

Liveness-detection methods are considered temporal-based andcan be easily fooled by a replay video of the face or iris or by addingliveness properties to a still face image by cutting the eyemouthparts

32 Recapturing-detection contextual information

Another approach that belongs to the temporal-based category aremethods that simply detect if the biometric data is a replay of arecorded sample referred to as recapturing-detection These meth-ods analyze the scene and environment to detect abnormal move-ments in the video due to a hand holding the presenting 2D mobiledevice or paper [37 38] Such movements are different than move-ment of a real face with a static background Sometimes thesetechniques are referred to as motion-analysis depending on motioncues from planar objects Some use optical flow to capture and trackthe relative movements between different facial parts to determinespoofing [39 40] while other researches use motion magnifica-tion [27 41] Haralick features [99] or visual rhythm analysis [42]Other types depend on contextual analysis of the scene [22 30 4344] or 3d reconstruction [45]

The recapture detection methods fail if the attack is done using anartifact like a mask because no replay is present to detect So a betterand more generic approach is using texture-based methods whichutilize features that can tell the difference between real live biometricdata and a replay or an artifact being presented to the sensor

33 Texture-based (Hand-crafted features)

The category of texture-based methods assumes that the texturecolor and reflectance characteristics captured by genuine face imagesare different from those resulting from presentation attacks Forexample blur and other effects caused by printers and displaydevices lead to detectable color artifacts and texture patterns whichcan then be explored to detect presentation attacks

Researchers use handcrafted image feature extraction methods toextract image features from face or iris images They then use a clas-sification method such as support vector machines (SVM) to classifyimages into two classes of bona-fide or presentation attack based onthe extracted image features Such techniques are faster than tem-poral or motion-based analysis as they can operate with only oneimage sample or a selection of images instead of having to analyzea complete video sequence

331 Texture-reflectance These techniques analyze appear-ance properties of face or iris such as the face reflectance or texture


Table 1 PAD methods in literature

PAD Method Advantage Weakness

Active (Hardware-based or Challenge-response)

- Eye hippus or Pupil-dilation [11ndash19]- Challenge blink eye or rotate head [22ndash25]

Robust against different PA typesHigher PA detection rates

More expensiveLess convenient to usersRely on an external device

Passive (Software-based)

Liveness-detection

- Eye blinking [24 26ndash29]- Facial expression changes head andor mouthmovements [30ndash32]- Eye movements and gaze estimationf foriris [33ndash36]

Utilize features extracted from consecutiveframesDetect signs of life in captured biometric data

Can be easily fooled by a replay video or byadding liveness properties to a still faceimage by cutting the eyemouth parts

Recapturing-detection

- Detect abnormal movements in the video dueto a hand holding the presenting media [37 38]- Track the relative movements between differentfacial parts [39 40]- Motion magnification [27 41]- Visual rhythm analysis [42]- Contextual analysis of [22 30 43 44]- 3d reconstruction [45]

Very effective for replay attacks Fail if the attack is done using an artifact like amask

Texture-based(Hand-craftedfeatures)

- Texture-reflectance [46ndash48]- Frequency [38 49ndash52]- Image Quality analysis [12 12 18 53ndash60]- Color analysis [49 57 61ndash66]- Local-descriptors [38 43 63 67ndash69 69ndash7474 75 75ndash78 78ndash83]- Fusion [3 4 68 71 78 83ndash85]

Detect effects caused by printers and displaydevicesFaster than temporal or motion-basedanalysisOperate with only one image sample ora selection of images instead of videosequence

Design and selection of feature extractors isbased on expert knowledgeSensitive to varying acquisition conditionssuch as camera devices lighting conditionsand PAIsHave poor generalizability

Learned-features - Extract deep features classify classic [29 8687]- Combine with temporal information [79 85]- Other [1 2 88 89 89ndash93]- For iris [3 4 94ndash97]- Face and iris [5 98]

Instead of hand engineering featuresDeep network learns features that discriminatebetween bona-fide and attack samplesCan be used jointly with hand-crafted featuresor motion-analysis

Requires more dataCan fail to generalize to unseen sensors

The reflectance and texture of a real face or iris are different thanthose acquired from a spoofing material either printed paper screenof a mobile device mask [46ndash48] or glass eye

332 Image Quality analysis Techniques that depend onanalysis of image quality try to detect the presence of imagedistortion usually found in the spoofed face or iris image

(1) Deformation for Face For example in print attacks the facemight be skewed with deformed shape if an imposter bends it whileholding

(2) Frequency Based on the assumption that the reproduction ofpreviously captured images or videos affects the frequency informa-tion of the image when displayed in front of the sensor The attackimages or videos have more blur and less sharpness than the genuineones and so their high frequency components are affected Hencefrequency domain analysis can be used as a face PAD method asin [38 49 50] or frequency entropy analysis by Lee et al [51] Foriris Czajka [52] analyzed printing regularities left in printed irisesand explored some peaks in the frequency spectrum Frequency-based approaches might fail in case of high-quality spoof imagesor videos being presented to the camera

(3) Quality metrics Some other methods explore the potential ofquality assessment to identify real and fake face or iris samplesacquired from a high quality printed image This is assuming thatmany image quality metrics are distorted by the display device orpaper Some researches used individual quality-based features whileothers used a combination of quality metrics to better detect spoofingand differentiate between bona-fide and attack iris [12 18 53ndash55] orface [56ndash58] or both [59 60] For example Gabally et al [12] inves-tigated 22 iris-specific quality features and used feature selection tochoose the best combination of features that discriminate betweenlive and fake iris images Later in a follow-up work [100] they

assessed 25 general image-quality metrics and not only applied oniris PAD but also used the method for face and fingerprint PADOther different approach by Garicia et al [101] was to analyzemoire patterns which may be produced during image acquisitionthen display on mobile devices

(4) Color analysis Another type of algorithms utilize the fact thatthe distribution of color in images taken from bona-fide face or irisis different than the distribution found in images taken from printedpapers or display device Methods for face PAD adopt solutions ina different input domain than RGB space in order to improve therobustness to illumination variation like HSV and YCbCr colorspace [57 61ndash64] or Fourier spectrum [49] For iris PAD coloradaptive quantized patterns were used in [65] instead of using a gray-textured image Z Boulkenafeta et al studied the generalization ofcolor-texture analysis methods in [66]

333 Local-descriptors A different group of methods usehand-crafted features which extract local-texture for PAD [67 68]A large variety of local image descriptors have been comparedwith respect to their ability to identify spoofed iris fingerprint andface data [69] and some highly successful texture descriptors havebeen extensively used for this purpose For example in face PADmethods descriptors such as local binary pattern (LBP) [70ndash76]SIFT [77] SURF [63] HoG [43 74 75 78 79] DoG [38 80]and GLCM [78] have been studied showing good results in discrim-inating real from attack face images For PAD in iris recognitionboosted LBP was first used by He et al [102] weighted LBP byZhang et al [103] for contact lens detection and later binarizedstatistical image features (BSIF) [81 82] Local phase quantization(LPQ) was investigated for biometric PAD in general including bothface and iris by Gragnaniello et al [69] and census transform (CT)was used for mobile PAD in [83] Such local texture descriptorswere then used with a traditional classifier like SVM LDA neuralnetworks and Random Forest


334 Drawback of hand-crafted texture-based methodsResults of above methods show that the manually engineered fea-tures are suitable for solving the PAD problem for face and irisrecognition systems However their drawback is that the design andselection of handcrafted feature extractors is mainly based on expertknowledge of researchers on the problem Consequently these fea-tures often only reflect limited aspects of the problem and are oftensensitive to varying acquisition conditions such as camera deviceslighting conditions and presentation attack instruments (PAIs) Thiscauses their detection accuracy to vary significantly among differentdatabases indicating that the handcrafted features have poor gener-alizability and so do not completely solve the PAD problem Theavailable cross-database tests in the literature suggest that the per-formance of hand-engineered texture-based techniques can degradedramatically when operating in unknown conditions Leading to theneed of automatically extracting vision meaningful features directlyfrom the data using deep representations to assist in the task ofpresentation attack detection

34 Fusion

The presentation attack detection accuracy can be enhanced by usingfeature-level or score-level fusion methods to obtain a more gen-eral countermeasure effective against a various types of presentationattack For face PAD Tronci et al [84] propose a linear fusion at aframe and video level combination between static and video analy-sis In [78] Schwartz et al introduce feature level fusion by usingPartial Least Squares (PLS) regression based on a set of low-levelfeature descriptors Some other works [68 71] obtain an effectivefusion scheme by measuring the level of independence of two anti-counterfeiting systems While Feng et al in [85] integrated bothquality measures and motion cues then used neural networks forclassification

In the case of iris anti-spoofing Z Akhtar et al [83] proposed touse decision-level fusion on several local descriptors in addition toglobal features More recently Nguyen et al [3 4] explored bothfeature-level and score-level fusion of hand-crafted features withCNN extracted features and obtained least error using the proposedscore-level fusion methodology

35 Smartphones

In the last few years research based on smartphone and mobiledevice PAD is gaining popularity Many different algorithms havebeen developed and datasets were released to test their performanceThese databases became publicly available and we will list in thissection some of these datasets together with the algorithm proposedby the authors

One of the first databases that was dedicated for face PAD onsmartphones is MSU-MFSD [57] the authors Wen et al used aPAD method depending on image distortion analysis Later the samegroup released MSU-USSA [77] and used a combination of textureanalysis (LBP and color) and image analysis techniques In 2016 theIDIAP Research Institute released the Replay-Mobile [104] databaseand again a combinations of the Image Quality Analysis and TextureAnalysis (Gabor-jets) were used as the proposed PAD

In the field of Iris spoofing MobBioFake dataset [54 105] waspresented in 2014 that was the first to use visible spectrum insteadof near-infrared for iris spoofing where the acquisition sensor isa mobile device Sequeira et al [54] used texture-based iris PADalgorithm they combined several image quality features at the fea-ture level and used SVM for classification Gragnaniello et al [106]proposed a solution based on local-descriptors but with very lowcomplexity and required low CPU power suitable for mobile appli-cations They evaluated on MobBioFake and MICHE [107] datasetsand achieved good performance

36 Learned-features

In recent years deep learning has evolved and the use of deep neu-ral networks or convolutional neural networks (CNN) has proved to

be effective in many computer vision tasks especially with the avail-ability of new advanced hardware and large data CNNs have beensuccessfully used for vision problems like image classification andobject detection This has encouraged many researchers to incorpo-rate deep learning in the PAD problem Instead of using hand-craftedfeatures rely on a deep network or CNN to learn features thatdiscriminate between bona-fide and attack face [1 2 29 85 86 91ndash93] or iris [3 4 94 95] or both [5 98] Some proposed to usehybrid features that combine information from both handcrafted anddeeply-learnt features

Several works use a pretrained CNN as a feature extractor thenuse a normal classifier as SVM for classification like in [86]Li et al [86] finetune a pretrained VGG CNN model on Ima-geNet [108] use it to extract features reduce dimensionality byprincipal component analysis (PCA) then classify with SVM Com-bining deep features with motion cues was proposed in several facevideo spoof detection In [29] Patel et al finetuned a pretrainedVGG for texture features extraction in addition to using frame differ-ence as motion cue Similarly Nguyen et al [87] used VGG for deepfeature extraction then fused it with multi-level LBP features andused SVM for final classification Feng et al [85] combine imagequality information together with motion information from opticalflow as input to a neural network for classification of bona-fide orattack face Xu et alpropose an LSTM-CNN architecture to utilizetemporal information for binary classification in [79]

Atoum et al [88] propose a two-steam CNN-based face PADmethod using texture and depth they utilized both full face andpatches from the face with different color spaces instead of RGBThey evaluated on three benchmark databases but did not performcross-dataset evaluation Later Liu et al [89] extended and refinedthis work by learning another auxiliary information rPPG signalfrom the face video and evaluated their depth-supervised learn-ing approach on more recent datasets OULU-NPU [109] and SiWWang et al [90] did not consider depth as an auxiliary supervisionlike in [89] however they used multiple RGB frames to estimateface depth information and used two modules to extract short andlong-term motion Four benchmark datasets were used to evaluatetheir approach

Use of CNN by itself for the sake of classifying attack vsbona-fide iris images has been investigated in recent years AndreyK et al [96] proposed to use a lightweight CNN on binarized sta-tistical image features extracted from the iris image they evaluatedtheir proposed algorithm on LivDet-Warsaw 2017 [110] and other3 benchmark datasets In [97] Hoffman et al proposed to use 25patches of the iris image as input to the CNN instead of the fulliris image Then a patch-level fusion method is used to fuse scoresfrom the input patches based on the percentage of iris and pupilpixels inside each patch They performed cross-dataset evaluationon LivDet-Iris 2015 Warsaw CASIA-Iris-Fake and the BERC-Iris-Fake datasets and used true detection rate (TDR) at 02 falsedetection rate (FDR) as their evaluation metric

4 Competitions and Databases

In this section we summarize the face and iris PAD competitionsheld in addition to benchmark datasets used to evaluate perfor-mance of the face or iris presentation attack detection solutions Allthese databases are publicly available upon request from the authorsTable 2 shows summary of information about all six datasets used

41 Competitions

Since 2011 several competitions were held to assess the status ofpresentation attack detection for either face or iris Such competi-tions were very useful for collecting new benchmark datasets andcreating a common framework for evaluating the different PADmethods

For face presentation attack detection three competitions werecarried out since 2011 The first competition was held during2011 [111] on the IDIAP Print-Attack dataset [67] with six com-peting teams the winning methodology was using hand-engineered


Table 2 Properties of used face and iris databases

Database Number of samples Resolution FPS Video(Attack + Bona-fide) ( W H ) duration (face videos)

Replay-Attack (Face) 1000 + 200 (videos) 320times 240 25 9-15 secondsMSU-MFSD (Face) 210 + 70 (videos) 720times 480 30 9-15 secondsReplay-Mobile (Face) 640 + 390 (videos) 720times 1280 30 10-15 secondsWarsaw 2013 (Iris) 815 + 825 (images) 640times 480 -ATVS-FIr (Iris) 800 + 800 (images) 640times 480 -MobBioFake (Iris) 800 + 800 (images) 250times 200 -

texture-based approaches which proved to be very effective to detectprint attacks After two years the second face PAD competitionwas carried out in 2013 [112] with 8 participating teams on theReplay-Attack dataset [73] Most of the teams relied only on texture-based methods while 3 teams used hybrid approaches combiningboth texture and motion based countermeasures Two of these hybridapproaches reached 0 HTER on the test set of the Replay-Attackdatabase

Later in 2017 a competition was held on generalized software-based face presentation attack detection in mobile scenarios [64]Thirteen teams participated and four protocols were used for evalu-ation The winning team that showed best generalization to unseencameras and attack types used a combination of color texture andmotion-based features

More recently after the release of the large-scale multi-modalface anti-spoofing dataset CASIA-SURF [113] the Chalearn LAPMulti-Modal Face anti-spoofing Attack Detection Challenge [114]was held in 2019 Thirteen teams were qualified for the final roundwith all submitted face PAD solutions relying on CNN-based featureextractors The top 3 teams utilized ensembles of feature extractorsand classifiers For future work the challenge paper suggested totake full advantage of the multi-modal dataset by defining a setof cross-modal testing protocols as well as introducing 3D maskattacks in addition to the 2D attacks These recommendations werelater fulfilled in the following year at the Chalearn Multi-modalCross-ethnicity Face anti-spoofing Recognition Challenge using theCASIA-SURF CeFA [115] dataset

For iris presentation attack detection the first competition wasLivDet-Iris 2013 [116] then two more sequels for this competitionwere held in 2015 [117] and 2017 [110] Images in LivDet-Irisincluded iris printouts and textured contact lenses which were moredifficult to detect than printouts For LivDet-Iris 2017 the testimages were split into known and unknown parts where the knownpart contained images that were taken under similar conditions withthe same sensor as training images On the other hand the unknownpartition consisted of images taken in different environment with dif-ferent properties Results on known partition were much better thanthose on the unknown part In 2014 another competition was heldin 2014 specifically for mobile authentication Mobile Iris LivenessDetection Competition (MobILive 2014) [118] A mobile device wasused to capture images of iris printouts in visible light MobBioFakedataset not infra-red light as datasets used in LivDet-Iris competi-tions Six teams participated and the winning team achieved perfectdetection of presentation attack For a more thorough assessment ofthese iris competitions refer to Czajka and Bowyer [6]

42 Face spoofing Databases

In Table 3 we add details on the presentation media and presentationattack instruments (PAI) of the used face video datasets

421 Replay-Attack One of the earliest datasets presented inliterature in 2012 for the problem of face spoofing is the Replay-Attack dataset [73] as a sequel to its Print-Attack version introducedin 2011 [67] The database contains a total of 1200 short videosbetween 9 and 15 seconds The videos are taken at resolution 320times240 from 50 different subjects Each subject has 4 bona-fide videosand 20 spoofed ones so the full dataset has 200 bona-fide videos and1000 spoofed videos The dataset is divided into 3 subject-disjointsubsets one for training one for development and one for testing

The training as well as the development subset each has 15 subjectswhile the test subset has videos from the remaining 20 subjectsResults on this dataset is reported as the HTER (Half Total ErrorRate) on the test subset when the detector threshold is set on theEER (Equal Error Rate) of the development set For both the realand spoof videos each subject was recorded twice with a regularwebcam in two different lighting settings controlled and adverse

As for the spoofed videos first high-resolution videos were takenof the subject with a Canon PowerShot SX150 IS camera thenthree techniques were used to generate 5 spoofing scenarios (1)hard-copy print-attack where high-resolution digital photographsare presented to the camera The photographs are printed with aTriumph-Adler DCC 2520 color laser printer on A4 paper (2)mobile-attack using a photo once and replay of the recorded videoon an iPhone 3GS screen (with resolution 480times 320 pixels) once(3) high-definition attack by displaying a photo or replaying thevideo on high-resolution screen using an iPad (first generationwith a screen resolution of 1024times 768 pixels) Each of the 5 scenar-ios was recorder in two different lighting conditions as stated abovethen presented to the sensor in 2 different ways Either with a fixedtripod or by an attacker holding the presenting device (printed paperor replay device) with hisher hand leading to a total of 20 attackvideos per subject More details of the dataset can be found at theIDIAP Research Institute website lowast

422 MSU Mobile Face Spoofing Database (MSU-MFSD)The MSU-MFSD dataset [57] was introduced in 2014 by the Michi-gan State University (MSU) Itrsquos main aim was to tackle the problemof face spoofing on smartphones where some mobile phone appli-cations use face for authentication and unlocking the phone Thedataset includes real and spoofed videos from 35 subjects with aduration between 9 and 15 seconds Unlike Replay-Attack datasetwhere only a webcam was used for authentication in MSU-MFSDtwo devices were used the built-in webcam of a MacBook Air 13with resolution 640times 480 and the front facing camera of the GoogleNexus 5 smartphone with 720times 480 resolution So for each sub-ject there are 2 bona-fide videos one from each device and sixattack videos 3 from each device Leading to a total of 280 videos210 attack and 70 bona-fide videos The attack scenarios are allpresented to the authentication sensor with a fixed support unlikeReplay-Attack which has videos with fixed and others using hand-held devices Three attack scenarios were used (1) print-attack on A3paper (2) video replay attacks on the screen of an iPad Air and (3)video replay attacks on Google Nexus 5 smartphone The dataset isdivided into two subject-disjoint subsets one for training with 15subjects and the other for testing with 20 subjects The EER onthe test set is used as the evaluation metrics on the MSU-MFSDdatabase

423 Replay-Mobile In 2016 the IDIAP research institutethat released Replay-Attack released a new dataset Replay-Mobile [104] This was after the widespread of face authenticationon mobile devices and the increase of the need for face-PAD eval-uation sets to capture the characteristics of mobile devices Thedataset has 1200 short recordings from 40 subjects captured by twomobile devices at resolution 720times 1280 Ten bona-fide accesseswere recorded for each subject in addition to 16 attack videos takenunder different attack modes Like Replay-Attack the dataset wasdivided into training development and testing subsets of 12 16 and12 subjects respectively and results are reported as HTER on the testset

Five lighting condition were used in recording real-accesses con-trolled adverse direct lateral and diffuse) For presentation attackhigh-resolution photos and videos from each client were taken undertwo different lighting conditions (lighton lightoff)

Attack was performed in two ways (1) photo-print attacks wherethe printed high-resolution photo was presented to the capturingdevice using either fixed or hand support (2) matte-screen attackdisplaying digital-photo or video with fixed supports

lowasthttpwwwidiapchdatasetreplayattack


Table 3 Face databases details

Dataset Sensor used forauthentication

PA artifact PAI (PresentationAttack Instrument)

Lightingconditions(bona-fide attack)

Subjects(train dev test)

Videos persubject(bona-fide+ attack)

Videos persubset(bona-fide +attack)

Replay-Attack(2012)

(1) W in MacBook laptop(320times 240)

1) PH 2) 720p high-def V Canon PowerShot SX150 IS

1) PR (A4)2) VR on M (iPhone)3) VR on T (iPad)

2 2 50(151520)

4 + 20 Train 60+300Dev 60+300Test 80+400

CASIA-FASD(2012)

1) Low Q longtime-usedUSB C (680times 480)2) Normal Q new USB C(680times 480)3) High Q Sony NEX-5(1920times 1080)

1) PH 2) 720p high-def V Sony NEX-5

1) PR (copper A4) -Warped2) PR (copper A4) -Cut3) VR on T (iPad)

1 1 50(20-30)

3 + 9 Train 60+180Test 90+270

MSU-MFSD(2014)

1) W in MacBook Air(640times 480)2) FC of Google Nexus 5M (720times 480)

1) PH (Canon PowerShot 550DSLR)2) 1080p high-def V (CanonPowerShot 550D SLR)3) 1080p M V (BC of iPhone S5)

1) PR (A3)2) high-def VR on T(iPad)3) M VR on M (iPhone)

1 1 35(15-20)

2 + 6 Train 30+90Test 40+120

Replay-Mobile(2016)

1) FC of iPad Mini2 T(720times 1280)2) FC of LG-G4 M (720times1280)

1) PH (Nikon coolix P520)2) 1080p M V (BC of LG-G4 M)

1) PR (A4)2) VR on matte-screen(Philips 227ELHscreen)

5 2 40(121612)

10 + 16 Train 120+192Dev 160+256Test 110+192

OULU-NPU(2017)

FC of 6 M (1920times 1080) 1) PH 2) high-res M V BC of Samsung Galaxy S6Edge M

1-2) PR (glossy A3) - 2Printers3) High-res VR on 19inch Dell display4) High-res VR on 13inch Macbook

3 3 55(201520)

36+72 4 ProtocolsTotal1980+3960

SiW (2018) 1) High Q C Canon EOST6 (1920times 1080)2) High Q C LogitechC920 W (1920times 1080)

1) PH (5 184 x 3 456)2) Use same live videos

1) PR2) PR of frontal-viewframe from a live V3-6) VR on 4 screensSamsung S8 iPhone7iPad Pro PC

1 1 165(90-75)

8 + 20 3 ProtocolsTotal1320+3300

CASIA-SURF(2018)

1) RealSense SR300- RGB (1280times 720)- DepthIR (640times 480)

1) PH 1) PR (A4) 1 11000(300100600)

1 + 6 Train900+5400Dev 300+1800Test1800+10800

CASIA-SURFCeFA (2019)

1) Intel Realsense- RGBDepthIR- all (1280times 720)

1) PH2) Same live videos3) 3D print face mask4) 3D silica gel face mask

1) PR2) VR

1 2 (2D)

0 60 4

1500(600300600)99 (0099)8 (008)

1 + 3

0 + 180 + 8

4 Protocols4500+135000+53460+196

C Camera BC Back-Camera FC Front-Camera W built-in webcam T Tablet M Mobile Smartphone Q Quality PH High-res photo V Video PR Hard-copy printof high-res photo VR Video replay

424 Others Some other datasets for face PAD are also avail-able but were not used in this paper

Fig 1 Sample images from the Replay-Attack datasets (adverselighting hand-support) Left to right top (bona-fide print highdef photo mobile photo) bottom (highdefphoto mobile video highdef video) client 007

(a) Laptop

(b) Android

Fig 2 Sample images from the MSU-MFSD datasets Left to right bona-fide printed photo iphone video ipad video client 001

CASIA-FASD [50] was released in 2012 having 50 subjects Threedifferent imaging qualities were used (Low Normal and High) andthree types of attacks are generated from the high quality recordsof the genuine faces Namely cut-photo attack warped-photo attack


and video attack on iPad were used The dataset is split into trainingset of 20 subjects and a testing set including the other 30 subjects

OULU-NPU [109] A mobile face PAD dataset was released in2017 of 55 subjects captured in three different lighting conditionsand background with six different mobile devices Both print andvideo-replay attacks are included using two different PAIs each Fourprotocols were proposed in the paper to evaluate the generalizationof PAD algorithms against different conditions

Spoof in the Wild (SiW) [89] Contains 8 real and 20 attack videosfor 165 subjects more subjects that any previous dataset andreleased in 2018 Bona-fide videos are captured with two high-quality cameras and collected with different orientations facialexpressions and lighting conditions Attack samples are either high-quality print attacks or replay videos on 4 different PA devices

CASIA-SURF [113] Later in same year 2018 a large-scale multi-modal dataset CASIA-SURF was released containing 21000 videosfor 1000 subjects Each video has three modalities namely RGBDepth and IR captured by Intel RealSense camera Only 2D printattack was performed with 6 different combinations of used photostate (either full or cut) and 3 different operations like bending theprinted paper or movement with different distance from the camera

CASIA-SURF CeFA [115] Although CASIA-SURF dataset [113]is large-scale it only includes one ethnicity (Chinese) and oneattack type (2D print) So in 2019 CASIA-SURF cross-ethnicityFace Anti-spoofing (CeFA) was released to provide a wider rangeof ethnicity and attack types in addition to multiple modalities asCASIA-SURF The dataset include videos for 1607 subjects cov-ering three ethnicities (Africa East Asia and Central Asia) threemodalities and 4 attack types Attack types are 2D print attacksreplay attacks in addition to 3D mask and silica gel attacks

43 Iris spoofing Databases

Many publicly available benchmark datasets were published for theiris presentation attack problem Some are captured with commercial

(a) Captured by Tablet

(b) Captured by Mobile

Fig 3 Sample images from the Replay-Mobile datasets (hand-support) Left to right bona-fide print photo mattescreen photo mattescreen video) client 001

(a) bona-fide

(b) Print attack

Fig 4 Sample images from the Iris datasets Left to right ATVS-FIr Warsaw 2013 MobBioFakeATVS-Fir id u0001s0001_ir_r_0002Warsaw id 0039_R_1MobBioFake id 007_R2

iris recognition sensors operating with near-infrared illuminationlike BioSec-ATVS-FIr [12] or LivDet datasets [52 110 117] whilemore recent datasets were introduced captured in the visible lightillumination using smartphones eg MobBioFake [54 105] Beloware details of the databases used in this paper

431 Biosec ATVS-FIr The ATVS-FIr dataset was introducedin 2012 [12] from the Biosec dataset [119] It contains 800 bona-fide and 800 attack photos of 50 different subjects The attack isperformed by enhancing quality of the bona-fide iris photo print-ing it on high-quality paper using HP printers then presenting itto the same LG Iris Access EOU3000 sensor used to capture thebona-fide irises All images have 640times 480 resolution and aregrayscale For each user 16 different bona-fide photos were cap-tured from which 16 attack photos were generated Four photoswere taken from each of the two eyes in two sessions so for eachuser total = 4 imagestimes 2 eyestimes 2 sessions = 16 bona-fide andsame for fake attacks Each eye of the 50 subjects was considered asubject on its own then the total of 100 subjects were divided into atraining set of 25 subjects (200 bona-fide + 200 attack) and a test setof 75 subjects (600 bona-fide + 600 attack)

432 LivDet Warsaw 2013 Warsaw University of Technologypublished the first LivDet Warsaw Iris dataset in 2013 [52] as part ofthe first international iris liveness competition in 2013 [116] Twomore versions of this dataset were later published in 2015 [117] and2017 [110] The LivDet-Iris-2013-Warsaw contains 852 bona-fideand 815 attack printout images from 237 subjects For attack imagesthe real iris was printed using two different printers then captured bythe same IrisGuard AD100 sensor as the bona-fide photos

433 MobBioFake The first competition for iris PAD inmobile scenarios was the Mobi-live competition in 2014 [118] TheMobBioFake iris spoofing dataset [54] was generated from the irisimages of the MobBIO Multimodal Database [105] It comprises800 bona-fide and 800 attack images captured by the back camera(8MP resolution) of an Android mobile smartphone (Asus Trans-former Pad TF 300T) in the visible light not in near-infrared likeprevious iris datasets The images are color RGB and have a resolu-tion of 250times 200 The MobBioFake database has 100 subjects eachhas 8 bona-fide + 8 attack images captured under different illumina-tion and occlusion settings The attack samples were generated fromprinted images of the original ones under similar conditions with thesame device The images are split evenly between training and testsubsets

5 Proposed PAD Approach Methodology Algorithm

Although several previous studies use CNN for presentationattack detection they either used custom designed networks eg


Fig 5 Overview of the PAD algorithm

Spoofnet [95 98] or used pre-trained CNN to extract features thatare later classified with conventional classification algorithms suchas SVM In this paper we propose to train deeper CNNs for thedirect classification of bona-fide and presentation attack images Wechoose to assess the state-of-the-art modern deep convolutional neu-ral networks that achieved high accuracies in the task of imageclassification on ImageNet [108] This PAD approach is a pas-sive software texture-based method that does not require additionalhardware nor cooperation from the user

In addition we compare two different fine-tuning approachesstarting from network weights trained on ImageNet classificationdataset A decision can be made to either retrain the whole networkweights given the new datasets of the problem or choose to freezethe weights of most of the network layers and finetune only a set ofselected last layers from the network including the final classificationlayer

Figure 5 shows the building blocks of the full pipeline proposed toclassify face videos or iris images as bona-fide attempts or spoofedattacks First the face or iris image is preprocessed and resizedthen input to the CNN to a series of convolutional pooling anddropout layers Finally the last fully connected layer of each networkis altered to output a binary bona-fideattack classification usingsigmoid activation instead of the 1000 classes of ImageNet

51 Preprocessing

511 Face videos Each of the face benchmark datasets usedin this paper have face coordinates for each frame of each video pro-vided These coordinates are used to crop a square to be used as inputto the network The square is cropped such that the region of interest(ROI) includes the face and some background in order to accountfor some contextual information The side of this square is chosenempirically to be equal twice the minimum of the width and heightof the face groundtruth box The cropped bounding box containingthe face is then resized to each networkrsquos default size to be used asinput to the first convolutional layer of the CNN

512 Iris images When using iris images for recognition orpresentation attack detection most studies in the literature firstextract and segment the iris region However in this paper wedirectly use the full iris image captured by the NIR sensor or smart-phone camera in order to make the approach as simple as possibleThis helps the the network to learn the best regions from the fullperiocular area that affect the classification problem The presenceof some contextual information like paper borders can also be usefulfor the attack detection

Table 4 Performance of selected Deep CNN architectures on ImageNet interms of Single-model Error reported in each modelrsquos paper

top-1(single-crop multi-crop )


parameters layers

InceptionV3 [120] 212 1947(12-crop)

56 448(12-crop)

239M 159

MobileNetV2(alpha=14) [122]

253 - 758 - 62M 88

MobileNetV2(defaultalpha=10) [122]

28 - 98 - 35M 88

VGG16 [123] - 244 995 71 1384M 23

Not available in paper reported by Keras httpskerasio

52 Generalization Data augmentation

It is known that deep networks have a lot of parameters to train whichcan cause it to overfit if the training set is not big enough Dataaugmentations is a very helpful technique that is used to artificiallyincrease the size of the available training set and to simulate real-life distortions and variations that may not be present in the originalset So during training for each mini-batch pulled from the train-ing set images in this batch are randomly cropped rotated slightlywith a random angle scaled or translated in the x or y axis so theexact same image is not seen twice by the network This leads to anartificially simulated larger training dataset and a more generalizednetwork that is robust to changes in pose and illumination

53 Modern deep CNN architectures

Based on official results published for recent CNN networks on theImageNet dataset [108] three of the networks that achieved less than10 top-5 accuracies are InceptionV3 [120] ResNet50 [121] andMobileNetV2 [122] Table 4 states the single-model top-1 and top-5error for each of these network on the ImageNet validation datasetits depth and number of parameters ordered by ascending top-5 errorvalues One thing that can be noticed about the higher accuracy net-works is the increased number of layers these networks are muchdeeper than for example the well-known VGG [123]

For the purpose of this paper we chose networks that have lessthan 30 million parameters to finetune And for the sake of diver-sity we selected two networks one with relatively high-numberof parameters InceptionV3 [120] ( 24 M parameters with 56top-5 accuracy) then a more recent and less deep network withfar less number of parameters MobilenetV2 [122] ( 4 M param-eters with 99 top-5 accuracy) which makes it very suitable formobile applications Even though all three networks have signifi-cantly less parameters than VGG they are much deeper and havehigher accuracy

In each architecture its final prediction softmax layer is removedand the output of the preceding global average pooling layer is fedinto a dropout layer (dropout rate of 05) then a sigmoid activationlayer for final binary prediction

54 Fine-tuning approaches

As stated previously the available datasets are small sized and theused networks have millions of parameters to learn This will prob-ably lead the model parameters to overfit these small sets if thenetwork weights were trained from scratch ie starting from randomweights The training model will get stuck into a local minima lead-ing to a poor generalization ability A better approach is initializingthe weights of the network with weights pre-trained on ImageNetthen we have three training options (1) use transfer learning feedthe images to the network to extract features from the last layer


Table 5 Number of all parameters in each model and the number of parameters to learn if finetuning only the final block is done

Model Total of parameters Last block To train First trainable layer parameters to learn

InceptionV3 21804833 Block after mixed9 (train mixed9_1 until mixed10) conv2d_90 6075585MobileNetV2 2259265 last _inverted_res_block block_16_expand 887361

before prediction then use these features instead of raw image pix-els for classification using an SVM or NN (2) finetune weights ofonly several last layers of the network while fixing the weights ofprevious layers to their ImageNet-trained values or (3) finetune thewhole network weights using the new input dataset

We experiment with only fine-tuning either the whole network ora few top layers which we choose to be the last convolutional blockFor the approach of fine-tuning the last convolutional block we statein Table 5 for each architecture the name of the last blockrsquos firstlayer and the number of parameters in this block to be finetuned

55 Single model with multiple biometrics

In addition to trainingtesting with datasets that belong to same bio-metric source ie face only or iris only we experiment with traininga single generic presentation attack detection network to detect pre-sentation attack whether the presented biometric is a face or an irisWe train with different combinations of face+iris sets then cross eval-uate the trained models on the other face and iris datasets to see ifthis approach is comparable to training on only face images or irisimages

6 Experiments and Results

In this section we describe images and videos preprocessing methodused explain the followed training strategy and experimental setup

61 Preprocessing

611 Preparing images For iris images only the MobBio-Fake database consists of color images captured using a smartphoneThe other datasets ATVS-FIr and Warsaw were obtained with near-infrared sensors producing grayscale images Since all networksused in this paper were pretrained on colored images of ImageNettheir input layer only accepts color images with three channels andso for grayscale iris images the single gray channel is replicatedtwo times to form a 3-channel image passed as the network inputThe whole iris image is used without cropping nor iris detection andsegmentation

On the other hand all face PAD datasets consist of color videosDuration of videos range from 9 to 15 seconds videos of Replay-Attack database have 25 frames per second (fps) while those ofMSU-MFSD have 30 fps Not all video frames are used to avoidredundancy and over-fitting to the subjects so we use frame skip-ping to sample one frame every 20 frames This frame droppingleads to sampling around 10 to 19 frames per video based on itslength For each sampled frame instead of using the whole frame asthe network input it is first cropped to either include only the faceregion or a box that has approximately twice the face area to includesome background context Annotated face bounding boxes have aside length ranging from 70px to 100px in Replay-Attack videosand from 170px to 250px for MSU-MFSD

Table 6 shows the training validation and testing sizes (in numberof images) for each of the datasets used in the experiments (afterframe dropping in videos) Emphasized in bold is the total numberof images available for training the CNN for each database

The images are then resized according to the default inputsize values for each of the used networks That is 224times 224 forMobilenetV2 and 299times 299 for InceptionV3 The RGB values ofinput images to the InceptionV3 and the MobilenetV2 networks arefirst scaled to have a value between -1 and 1

Table 6 Number of images used in training and testing For face datasetsnumber of videos is shown followed by number of frames between () For irisdatasets only number of dataset images is mentioned

Database Subset Number of videos (frames)Attack + bona-fide

Replay-Attacktrain 300(3414) + 60(1139) = 4553devel 300(3411) + 60(1140)test 400(4516) + 80(1507)

MSU-MFSD train 90(1257) + 30(419) = 1676test 120(1655) + 84(551)

Replay-Mobiletrain 192(2851) + 120(1762) = 4613devel 256(3804) + 160(2356)test 192(2842) + 110(1615)

ATVS-FIr train 200 + 200 = 400test 600 + 600

Warsaw 2013 train 203 + 228 = 431test 612 + 624

MobBioFake train 400 + 400 = 800test 400 + 400

(frames skipping every 20 frames)

612 Data augmentation As shown in Table 6 the total num-ber of images available from each dataset to train the CNN isrelatively much smaller than sizes of datasets usually used for deepnetworks training This could cause the network to memorize thesesmall set of samples instead of learning useful features and so failto generalize So in order to reduce overfitting data augmentationis used during training Images are randomly transformed beforefeeding to the network with one or more of the following transforma-tions Horizontal flip Rotation between 0 and 10 degrees Horizontaltranslation by 0 to 20 percent of image width Vertical translation by0 to 20 percent of image height or Zooming by 0 to 20 percent

62 Training strategy and Hyperparameters

For training the CNN networks we used Adam optimizer [124]with beta1 = 09 beta2 = 0999 and learning rate 1times 10minus4Wetrained the network for max of 50 epochs but added a stoppingcriteria if the training or validation accuracy reached 9995 orvalidation accuracy did not improve (with delta = 00005) for 40epochs

Intermediate models with high validation accuracy were savedand finally the model with the least validation error was chosenfor inference Because of the randomness introduced by the dataaugmentation and the dropout several training sessions may pro-duce slightly different results specially in cross-validation for thisreason three runs were performed for each configuration (model +dataset + fine-tuning approach) and the min error is reported

63 Evaluation

631 Videos scoring The CNN accepts images as input sowe treated each frame in the face video as a separate image Then atreporting the final score for each video we perform score-level fusionby averaging scores of the samples from the same video Accuracyand errors can then be calculated based on these video-level scoresnot frame-level scores as some reported results in literature


Table 7 Intra and cross-dataset results Test HTER reported with Replay-Attack and Replay-Mobile testing otherwise Test EER

Train Database Test Database Finetune Approach (A) Finetune Approach (B)InceptionV3 MobilenetV2 InceptionV3 MobilenetV2

Replay-AttackReplay-Attack 69 16 0 063MSU-MFSD 33 30 175 225Replay-Mobile 358 434 647 567

MSU-MFSDReplay-Attack 347 229 238 137MSU-MFSD 75 205 0 208Replay-Mobile 326 268 246 237

Replay-MobileReplay-Attack 332 451 311 455MSU-MFSD 354 30 329 375Replay-Mobile 36 32 0 0

ATVS-FIr ATVS-FIr 0 0 0 0Warsaw 2013 19 54 016 16MobBioFake (Grayscale) 44 39 365 39

Warsaw 2013 ATVS-FIr 0 1 017 05Warsaw 2013 0 032 0 0MobBioFake (Grayscale) 43 39 328 47

MobBioFakeATVS-FIr 10 30 187 202Warsaw 2013 157 32 234 188MobBioFake 875 7 05 075

MobBioFake (Grayscale)ATVS-FIr 25 237 12 162Warsaw 2013 62 173 101 172MobBioFake (Grayscale) 105 73 025 125

632 Evaluation metrics The equal-error-rate (EER) is usedto report the error of the test set EER is defined as the rate atwhich both the False Rejection Rate (FRR) and the False Accep-tance Rate (FAR) are equal FRR is the probability of the algorithmrejecting bona-fide samples as attack ones which is equal toFRtotal bona fide attempts while FAR is the probability bywhich the algorithm might accept attack samples and consider thembona-fide FAtotal attack attempts

For datasets that have a separate development subset eg Replay-Attack first a score threshold is calculated that achieves EER onthe devel set Then this threshold is used to calculate error in thetest set known as half-total error rate (HTER) which is equal tothe summation of the False Rejection Rate (FRR) and the FalseAcceptance Rate (FAR) at a threshold divided by two HTER =(FRR+ FAR)2

An ISOIEC international standard for PAD testing was describedin ISOIEC 30107-32017 lowast where PAD subsystems are to be eval-uated using two metrics attack presentation classification errorrate (APCER) and bona-fide presentation classification error rate(BPCER) However we report results in this paper using HTER andEER to be able to compare with state-of-the-art published resultsEER can also be understood as the rate at which APCER is equal toBPCER

633 Cross-dataset evaluation In order to test the gener-alization ability of the used networks we perform cross-datasetevaluation where we train on a dataset and test on another datasetFor the iris datasets we know of no previous work to have performedEER cross-dataset evaluation on iris PAD databases before

64 Experimental setup

For implementation we used Keras dagger with tensorflow Dagger backend forthe deep CNN models and for datasets management and experi-ments pipelines we used Bob package [125 126] Experiments wereperformed on NVIDIA GeForce 840m GPU with CUDA version 90

lowasthttpswwwisoorgobpuiisostdiso-iec

30107-3ed-1v1endaggerhttpsgithubcomkeras-teamkerasDaggerhttpswwwtensorfloworg

65 Experiment 1 Results

In the first experiment we compare the two different finetuningapproaches explained in Section 54 For this experiment trainingand testing was done using face only or iris only datasets We per-formed intra and cross-dataset testing to evaluate the effectivenessand generalization of the proposed approach and used benchmarkdatasets to be able to compare against state-of-the-art For each fine-tuning approach we performed three runs and reported the minimumerror obtained In Table 7 we refer to the first finetuning approachas (A) where the weights of only the last block is finetuned whileapproach (B) is when all the layersrsquo weights are finetuned using thegiven dataset

It can be noticed from Table 7 that finetuning all the networkweights achieved better results than fixing weights of first layers totheir ImageNet-trained values and only finetuning just the final fewlayers For intra-datasets error we achieved State-of-the-art (SOTA)0 test error for all datasets using InceptionV3 while MobilenetV2achieved 0 for most datasets and 1-2 on MSU-MFSD and Mob-BioFake The table also shows that cross-dataset errors on the irisdatasets ATVS-FIr and Warsaw are much lower when grayscale ver-sion of the MobBioFake iris dataset was used for training instead ofusing the original color images of the dataset without much affectingthe intra-dataset error on MobBioFake itself


For the second experiment we only use finetuning approach (B)adjusting the weights of all the networkrsquos layers using the trainingimages Here we use both face and iris images as train images inputto a single model for classifying bona-fide from attack presentationin general regardless the biometric type

Results in Table 8 show that using both face and iris images intraining a single model to differentiate between bona-fide and pre-sentation attack images achieved very good results on both faceand iris test sets For some cases in face datasets this combinationboosted the performance for cross-dataset testing where for exam-ple on Replay-Attack test dataset 119 HTER was achieved whentraining a MobilenetV2 using MSU-MFSD and Warsaw Iris datasetvs 137 HTER when only MSU-MFSD was used for training asreported in Table 7


Table 8 Training with Face+Iris images Intra and cross-dataset results Test HTER reported with Replay-Attack and Replay-Mobile testing otherwise Test EER

Train Iris Database ATVS-FIr Warsaw 2013 MobBioFake (RGB) MobBioFake (Grayscale)

Train FaceDatabase

Test Database InceptionV3 MobilenetV2 InceptionV3 MobilenetV2 InceptionV3 MobilenetV2 InceptionV3 MobilenetV2

Replay-Attack

ndash Number of trainingepochs

2 6 3 6 4 9 2 8

Replay-attack 0 063 0 0 025 15 0 14MSU-MFSD 296 204 15 20 196 25 30 175Replay-Mobile 581 487 573 354 606 581 39 40ATVS-FIr 0 0 105 27 517 88 35 123Warsaw 2013 016 016 0 0 413 162 152 57MobBioFake (RGB) 585 713 56 455 075 025 175 2MobBioFake (Grayscale) 50 69 563 385 15 075 125 1

MSU-MFSD


4 6 3 5 10 9 7 9

Replay-Attack 228 145 253 119 228 277 166 216MSU-MFSD 0 208 04 208 25 208 04 25Replay-Mobile 287 261 192 248 227 268 252 17ATVS-FIr 0 0 033 23 118 125 08 1233Warsaw 2013 06 08 0 0 439 655 74 91MobBioFake (RGB) 525 392 455 42 2 125 25 25MobBioFake (Grayscale) 453 385 345 43 22 425 175 2

Replay-Mobile


2 3 1 3 4 7 5 4


(bold-underlined) best cross-dataset error for each test dataset (bold) best cross-dataset error for each test dataset in each row (ie given a certain face train set)

Table 9 Best cross-dataset results and comparison with SOTA Test HTER reported with Replay-Attack testing otherwise Test EER Underlined is the best cross-dataset error for the test dataset

Train Database Test Database SOTA corresponding intra-dataset error Our Best corresponding intra-dataset errorModel trained with only 1 biometric Model trained with face + iris

Replay-Attack MSU-MFSD 285 825 175 0 15 0GuidedScale-LBP [127] LGBP InceptionV3 (B) InceptionV3 (Warsaw 2013)

Replay-Mobile 492 313 358 69 354 0GuidedScale-LBP [127] LBP+ GS-LBP InceptionV3 (A) MobilenetV2 (Warsaw 2013)

MSU-MFSDReplay-Attack 2140 15 137 208 119 208

Color-texture [66] MobilenetV2 (B) MobilenetV2 (Warsaw 2013)

Replay-Mobile 321 85 237 208 17 25GuidedScale-LBP [127] LBP+ GS-LBP MobilenetV2 (B) MobilenetV2 (MobBioFake (Gray))

Replay-MobileReplay-Attack 4619 052 311 0 242 0

GuidedScale-LBP [127] LGBP InceptionV3 (B) InceptionV3 (ATVS-FIr)

MSU-MFSD 3356 098 30 32 304 0GuidedScale-LBP [127] LBP+ GS-LBP MobilenetV2 (A) MobilenetV2 (ATVS-FIr)

ATVS-FIrWarsaw 2013 None available 016 0 016 0

InceptionV3 (B) Any (Replay-Attack)

MobBioFake (Gray) None available 365 0 335 0InceptionV3 (B) InceptionV3 (Replay-Mobile)

Warsaw 2013ATVS-FIr None available 0 0 033 0

InceptionV3 (A) InceptionV3 (MSU-MFSD)

MobBioFake (Gray) None available 328 0 345 0InceptionV3 (B) InceptionV3 (MSU-MFSD)

MobBioFake (Gray)ATVS-FIr None available 25 105 08 175


Warsaw 2013 None available 62 105 57 1InceptionV3 (A) MobilenetV2 (Replay-Attack)

Table 8 also confirms findings from Table 7 that training withgrayscale version of the MobBioFake iris dataset gives better cross-dataset error on the other gray iris datasets without much affectingthe intra-dataset error on the MobBioFake dataset itself Not onlythis but it also achieves lower cross-dataset error rates when eval-uating on face datasets by helping the network to focus on color

features when only testing is performed on a face image since alltraining face images were colored but all iris training images weregrayscale


Fig 6 High frequency components Attack iris images from Warsawdataset Left-to-right Input image Frequency compoents (Fourier) High-pass-filter Inversefourier on image High frequency components overlayed over image

(a) Bona-fide Iris samples

(b) Attack Iris samples

Fig 7 High frequency components in bona-fide vs Attack irisimages from Warsaw datase Left-to-right Input image High frequency components overlayed over image Net-work activation response of realF Network activation response of attackF Top 0050_R_2 Bottom 0088_R_3

67 Comparison with state of the art

In Table 9 we compare our best achieved cross-datasets evalua-tion error with the state-of-the-art (SOTA) This table shows thatour cross-dataset testing results surpass SOTA on face datasetsFor models trained on single biometric InceptionV3 achieved bestcross-dataset error on most datasets except when training withMSU-MFSD where MobilenetV2 achieved best test HTER onReplay-Attack and Replay-Mobile While both models interchange-ably achieved better error rate when network was trained using bothface and iris images

68 Analysis and discussion

Given results in Table 8 we feel that it is important to analyzefeatures learnt by the networks that were trained on two biometrics

681 Filters Activation and frequency components Bytraining the network on several biometrics we believe that the net-work is forced to learn to recognize specific patterns and texturein case of real presentations in general which are not present inattack samples and vice-versa An approach to prove that is to visu-alize the response of certain feature maps from the network givensample input images For this task we chose a MobilenetV2 net-work trained with Iris-Warsaw and Face Replay-Attack datasetsfrom which we selected to visualize response of some filters fromthe final convolutional block before classification

(a) Bona-fide Face samples

(b) Attack Face samples

Fig 8 High frequency components in bona-fide vs Attack faceimages from Replay-Attack dataset Left-to-right Input image High frequency components overlayed over image Net-work activation response of realF Network activation response of attackF Top client002 Bottom client103

(a) Correctly classified bona-fide images

(b) Incorrectly classified bona-fide images

Fig 9 Sample classified bona-fide images Left-to-right Replay-Attack MSU-MFSD - Replay-Mobile ATVS-FIR WarsawMobbioFake

(a) Correctly classified Attack images

(b) Incorrectly classified Attack images

Fig 10 Sample classified Attack images Left-to-right Replay-Attack MSU-MFSD - Replay-Mobile ATVS-FIR WarsawMobbioFake

In addition to visualizing network activation response we werealso interested in viewing the distribution of high frequency com-ponents on the input images We applied high pass filter to removemost of the low frequencies and keep only the high frequency com-ponents of an image then we overlay these high frequency regions


on the input image for better understanding of where high frequencyis concentrated in bona-fide images vs attack images

The steps for obtaining the regions with high frequency in imageare depicted in Figure 6 Fourier transform was first applied to getfrequency components in the image High-pass-filter is then usedto remove most of the low frequencies After that inverse Fourier isused to obtain an image with only the high frequency regions visibleand finally these regions are overlayed on original image to highlightthose regions on the input image

Regarding the chosen filters to visualize the final layer beforeclassification is consisted of 1280 filter when an input image is ofsize 224times 224 the feature maps resulting from these final 1280 fil-ters have size 7times 7 From these 1280 filters we selected two filtersfor each biometric One that had highest average activation responseby real images we call it realF and the other filter that was highlyactivated by attack images called attackF We then visualize the7times 7 activation response of these filters for some bona-fide andattack samples belonging to the same subject

Figure 7 shows samples from the iris Warsaw dataset with theirhigh frequency regions highlighted in addition to activation responseof realF and attackF features The visualization clearly shows thatfor real iris samples high frequency areas are concentrated at thebright regions around the inner eye corner and the lash-line how-ever for attack presentations high frequency regions mostly focuson noise patterns scattered on the attack medium (paper) The trainednetwork could successfully recognize the real features highly rep-resented at the eye corner by realF giving high response at thoseregions in real image while failing to detect them in attack sam-ples AttackF also manages to detect paper noise patterns in attacksamples which are missing in the real images

Similar visualization for face images of Replay-Attack datasetis shown in Figure 8 The attack presentation in these samples isachieved using handheld mobile device showing a photo of the sub-ject We can notice the absence of high frequency response aroundthe face edges mouth and eyes which are present in real samplesand detected by the network through realF feature On the otherhand these attack images contain certain noise patterns on the clearbackground area which are detected successfully by the attackF ofthe network

By visualizing the high frequency regions in the images it wasproven that high frequency variations in case of real images focuson certain edges and changes in the 3d real biometric presentedWhile in case of attack presentations high frequency componentsrepresented noise either spread on paper in case of paper attackmedium or noise patterns resulting from the display monitor ofa mobile attack device Our trained CNN network managed tocorrectly localize such features and lead to successful classification

682 Gradient-weighted class activation maps Anotherapproach to analyze what the network learned is by using gradient-weighted class activation maps [10] grad-cam which highlights theareas that contributed most to the networkrsquos classification decisionfor an input image

We chose the trained MobilenetV2 network on Replay-Attackface dataset and MobbioFake gray iris dataset to analyze the acti-vation maps for some bona-fide and attack test images from the 6datasets Figure 9 shows the grad-cam for the bona-fide class givensome real test samples that were either correctly of incorrectly clas-sified and Figure 10 shows the same but for presentation attackimages

The visualizations in these figures show clearly that the networkfocuses on features of the face and iris centers to make its decisionfor a bona-fide presentation and in cases when these center areashave lower activations the network decides the image is an attackAnd although the network was trained on two different biometrics itcould perfectly focus on the important center parts for each biometricindividually

7 Conclusion and future work

In this paper we surveyed the different face and iris presentationattack detection methods proposed in literature with focus on the useof very deep modern CNN architectures to tackle the PAD problemin both face and iris recognition

We compared two different networks InceptionV3 and MobileNetV2with two finetuning approaches starting from network weights thatwere trained on ImageNet dataset Experimental results show thatInceptionV3 achieved 0 SOTA intra-dataset error on 6 publiclyavailable benchmark datasets when finetuning the whole networkrsquosweights The problem of small datasets sizes with very deep net-works was overcome by augmenting the original data with randomflipping slight rotation and translation In addition to starting thenetwork training with non-random weights already pretrained onImageNet dataset This was to avoid the weights overfitting the smalltraining dataset however for future work we would like to checkthe effect of training these deep networks from scratch using theavailable PAD datasets

The generalization ability of these networks has been tested byperforming cross-dataset evaluation and better-than-SOTA resultswere reported for the face datasets As far as we know this is thefirst paper to include cross-dataset Equal-Error-Rate evaluation forthe three iris PAD datasets used for which we achieved close to 0test EER

Not only cross-dataset evaluation was performed but we alsotrained a single network using multiple biometric data face andiris images This commonly trained network achieved less cross-dataset error rates than networks trained with a single biometric typeThis approach can be later applied on other biometric traits suchas the ear to further demonstrate the effectiveness of using severalbiometric traits for training the PAD algorithm

Finally we provided analysis of the features learned by the net-works and showed their correlation with the frequency componentspresent in input images regardless the biometric type We also visu-alized the class heatmaps generated by the network for the bona-fideclass which showed that the center of the face or iris are the mostimportant parts that cause a network to select a bona-fide classifica-tion decision For future work we would like to apply patch-basedCNN and identify the regions that contribute most to the attackdetection in hope for improving the generalization of the approachby using a region-based CNN method


8 References1 Jourabloo A Liu Y Liu X lsquoFace de-spoofing Anti-spoofing via noise

modelingrsquo CoRR 20182 Nagpal C Dubey SR lsquoA performance evaluation of convolutional neural

networks for face anti spoofingrsquo CoRR 20183 Nguyen D RaeBaek N Pham T RyoungPark K lsquoPresentation attack detec-

tion for iris recognition system using nir camera sensorrsquo Sensors 2018 18pp 1315

4 Nguyen DT Pham TD Lee YW Park KR lsquoDeep learning-based enhancedpresentation attack detection for iris recognition by combining features from localand global regions based on nir camera sensorrsquo Sensors 2018 18 (8)

5 Pinto A Pedrini H Krumdick M Becker B Czajka A Bowyer KW et alCounteracting Presentation Attacks in Face Fingerprint and Iris Recognition InlsquoDeep learning in biometricsrsquo (CRC Press 2018 p 49

6 Czajka A Bowyer KW lsquoPresentation attack detection for iris recognitionAn assessment of the state-of-the-artrsquo ACM Computing Surveys 2018 51 (4)pp 861ndash8635

7 Ramachandra R Busch C lsquoPresentation attack detection methods for facerecognition systems A comprehensive surveyrsquo ACM Computing Surveys 201750 (1) pp 81ndash837

8 AbdulGhaffar I Norzali M HajiMohd MN lsquoPresentation attack detec-tion for face recognition on smartphones A comprehensive reviewrsquo Journal ofTelecommunication Electronic and Computer Engineering 2017 9

9 Li L Correia PL Hadid A lsquoFace recognition under spoofing attackscountermeasures and research directionsrsquo IET Biometrics 2018 7 pp 3ndash14(11)

10 Selvaraju RR Das A Vedantam R Cogswell M Parikh D Batra DlsquoGrad-cam Why did you say that visual explanations from deep networks viagradient-based localizationrsquo CoRR 2016 Available from httparxivorgabs161002391

11 Daugman J lsquoIris recognition and anti-spoofing countermeasuresrsquo In 7thInternational Biometrics Conference ( 2004

12 Galbally J OrtizLopez J Fierrez J OrtegaGarcia J lsquoIris liveness detectionbased on quality related featuresrsquo In 2012 5th IAPR International Conferenceon Biometrics (ICB) ( 2012 pp 271ndash276

13 Pacut A Czajka A lsquoAliveness detection for iris biometricsrsquo 40th Annual IEEEInternational Carnahan Conferences Security Technology 2006 pp 122 ndash 129

14 Czajka A lsquoPupil dynamics for iris liveness detectionrsquo IEEE Transactions onInformation Forensics and Security 2015 10 (4) pp 726ndash735

15 Czajka A lsquoPupil dynamics for presentation attack detection in iris recognitionrsquoIn International Biometric Performance Conference (IBPC) NIST Gaithersburg( 2014 pp 1ndash3

16 Bodade R Talbar S lsquoDynamic iris localisation A novel approach suitablefor fake iris detectionrsquo In 2009 International Conference on Ultra ModernTelecommunications Workshops ( 2009 pp 1ndash5

17 Huang X Ti C Hou Q Tokuta A Yang R lsquoAn experimental study of pupilconstriction for liveness detectionrsquo In 2013 IEEE Workshop on Applications ofComputer Vision (WACV) ( 2013 pp 252ndash258

18 Kanematsu M Takano H Nakamura K lsquoHighly reliable liveness detectionmethod for iris recognitionrsquo In SICE Annual Conference 2007 ( 2007 pp 361ndash364

19 Lee EC RyoungPark K lsquoFake iris detection based on 3d structure of irispatternrsquo International Journal of Imaging Systems and Technology 2010 20pp 162 ndash 166

20 Mohd MNH Kashima M Sato K Watanabe M lsquoInternal state measure-ment from facial stereo thermal and visible sensors through svm classificationrsquoARPN J Eng Appl Sci 2015 10 (18) pp 8363ndash8371

21 Mohd M MNH K M S K W lsquoA non-invasive facial visual-infraredstereo vision based measurement as an alternative for physiological measure-mentrsquo Lecture Notes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics) 2015 10 (18)pp 684ndash697

22 Pan G Sun L Wu Z Wang Y lsquoMonocular camera-based face livenessdetection by combining eyeblink and scene contextrsquo Telecommunication Systems2011 47 (3) pp 215ndash225

23 Smith DF Wiliem A Lovell BC lsquoFace recognition on consumer devicesReflections on replay attacksrsquo IEEE Transactions on Information Forensics andSecurity 2015 10 (4) pp 736ndash745

24 Kollreider K Fronthaler H Bigun J lsquoVerifying liveness by multiple expertsin face biometricsrsquo In 2008 IEEE Computer Society Conference on ComputerVision and Pattern Recognition Workshops ( 2008 pp 1ndash6

25 Ali A Deravi F Hoque S lsquoDirectional sensitivity of gaze-collinearity featuresin liveness detectionrsquo In 2013 Fourth International Conference on EmergingSecurity Technologies ( 2013 pp 8ndash11

26 Wang L Ding X Fang C lsquoFace live detection method based on physiologicalmotion analysisrsquo Tsinghua Science and Technology 2009 14 (6) pp 685ndash690

27 Bharadwaj S Dhamecha TI Vatsa M Singh R lsquoComputationally efficientface spoofing detection with motion magnificationrsquo In 2013 IEEE Conferenceon Computer Vision and Pattern Recognition Workshops ( 2013 pp 105ndash110

28 Sun L Wu Z Lao S Pan G lsquoEyeblink-based anti-spoofing in face recogni-tion from a generic webcamerarsquo In 2007 11th IEEE International Conference onComputer Vision(ICCV) ( 2007 pp 1ndash8

29 Patel K Han H Jain AK lsquoCross-database face antispoofing with robust fea-ture representationrsquo In You Z Zhou J Wang Y Sun Z Shan S Zheng Wet al editors Biometric Recognition (Cham Springer International Publishing2016 pp 611ndash619

30 Marsico MD Nappi M Riccio D Dugelay J lsquoMoving face spoofing detec-tion via 3d projective invariantsrsquo In 2012 5th IAPR International Conference onBiometrics (ICB) ( 2012 pp 73ndash78

31 Pan G Wu Z Sun L lsquoLiveness detection for face recognitionrsquo In Delac KGrgic M Bartlett MS editors Recent Advances in Face Recognition (RijekaIntechOpen 2008

32 Kollreider K Fronthaler H Faraj MI Bigun J lsquoReal-time face detec-tion and motion analysis with application in acircAIJlivenessacircAI assessmentrsquo IEEETransactions on Information Forensics and Security 2007 2 (3) pp 548ndash558

33 Komogortsev OV Karpov A lsquoLiveness detection via oculomotor plant char-acteristics Attack of mechanical replicasrsquo In 2013 International Conference onBiometrics (ICB) ( 2013 pp 1ndash8

34 Komogortsev OV Karpov A Holland CD lsquoAttack of mechanical replicasLiveness detection with eye movementsrsquo IEEE Trans Inf Forens Security 201510 (4) pp 716ndash725

35 Rigas I Komogortsev OV lsquoGaze estimation as a framework for iris livenessdetectionrsquo IEEE International Joint Conference on Biometrics 2014 pp 1ndash8

36 Rigas I Komogortsev OV lsquoEye movement-driven defense against iris print-attacksrsquo Pattern Recogn Lett 2015 68 (P2) pp 316ndash326

37 Akhtar Z Foresti GL lsquoFace spoof attack recognition using discriminativeimage patchesrsquo Journal of Electrical and Computer Engineering 2016 2016pp 1ndash14

38 Tan X Li Y Liu J Jiang L lsquoFace liveness detection from a single imagewith sparse low rank bilinear discriminative modelrsquo In Daniilidis K MaragosP Paragios N editors Computer Vision ndash ECCV 2010 (Berlin HeidelbergSpringer Berlin Heidelberg 2010 pp 504ndash517

39 Kollreider K Fronthaler H Bigun J lsquoNon-intrusive liveness detection by faceimagesrsquo Image Vision Comput 2009 27 (3) pp 233ndash244

40 Bao W Li H Li N Jiang W lsquoA liveness detection method for face recognitionbased on optical flow fieldrsquo In Image Analysis and Signal Processing ( 2009pp 233ndash236

41 Siddiqui TA Bharadwaj S Dhamecha TI Agarwal A Vatsa M SinghR et al lsquoFace anti-spoofing with multifeature videolet aggregationrsquo 2016 23rdInternational Conference on Pattern Recognition (ICPR) 2016 pp 1035ndash1040

42 d SPinto A Pedrini H Schwartz W Rocha A lsquoVideo-based face spoofingdetection through visual rhythm analysisrsquo In 2012 25th SIBGRAPI Conferenceon Graphics Patterns and Images ( 2012 pp 221ndash228

43 Komulainen J Hadid A Pietikaumlinen M lsquoContext based face anti-spoofingrsquo2013 IEEE Sixth International Conference on Biometrics Theory Applicationsand Systems (BTAS) 2013 pp 1ndash8

44 Kim S Yu S Kim K Ban Y Lee S lsquoFace liveness detection using variablefocusingrsquo ICB 2013 pp 1ndash6

45 Wang T Yang J Lei Z Liao S Li SZ lsquoFace liveness detection using 3dstructure recovered from a single camerarsquo ICB 2013 pp 1ndash6

46 Kose N Dugelay J lsquoReflectance analysis based countermeasure technique todetect face mask attacksrsquo In 2013 18th International Conference on DigitalSignal Processing (DSP) ( 2013 pp 1ndash6

47 Erdogmus N Marcel S lsquoSpoofing in 2d face recognition with 3d masks andanti-spoofing with kinectrsquo In 2013 IEEE Sixth International Conference onBiometrics Theory Applications and Systems (BTAS) ( 2013 pp 1ndash6

48 Erdogmus N Marcel S lsquoSpoofing 2d face recognition systems with 3dmasksrsquo In 2013 International Conference of the BIOSIG Special Interest Group(BIOSIG) ( 2013 pp 1ndash8

49 Li J Wang Y Tan T Jain AK lsquoLive face detection based on the analysis ofFourier spectrarsquo In Jain AK Ratha NK editors Biometric Technology forHuman Identification vol 5404 ( 2004 pp 296ndash303

50 Zhang Z Yan J Liu S Lei Z Yi D Li SZ lsquoA face antispoofing databasewith diverse attacksrsquo In 2012 5th IAPR International Conference on Biometrics(ICB) ( 2012 pp 26ndash31

51 Lee T Ju G Liu H Wu Y lsquoLiveness detection using frequency entropy ofimage sequencesrsquo In 2013 IEEE International Conference on Acoustics Speechand Signal Processing ( 2013 pp 2367ndash2370

52 Czajka A lsquoDatabase of iris printouts and its application Development of livenessdetection method for iris recognitionrsquo In 2013 18th International Conference onMethods Models in Automation Robotics (MMAR) ( 2013 pp 28ndash33

53 Wei Z Qiu X Sun Z Tan T lsquoCounterfeit iris detection based on textureanalysisrsquo In ICPR 2008 19th International Conference on Pattern Recogni-tion(ICPR) ( 2009 pp 1ndash4

54 Sequeira AF Murari J Cardoso JS lsquoIris liveness detection methods in mobileapplicationsrsquo In 2014 International Conference on Computer Vision Theory andApplications (VISAPP) vol 3 ( 2014 pp 22ndash33

55 Sequeira AF Murari J Cardoso JS lsquoIris liveness detection methods in themobile biometrics scenariorsquo In Proceedings of International Joint Conferenceon Neural Networks (IJCNN) ( 2014 pp 3002ndash3008

56 Unnikrishnan S Eshack A lsquoFace spoof detection using image distortionanalysis and image quality assessmentrsquo In 2016 International Conference onEmerging Technological Trends (ICETT) ( 2016 pp 1ndash5

57 Wen D Han H Jain AK lsquoFace spoof detection with image distortion anal-ysisrsquo IEEE Transactions on Information Forensics and Security 2015 10 (4)pp 746ndash761

58 Galbally J Marcel S lsquoFace anti-spoofing based on general image quality assess-mentrsquo In 2014 22nd International Conference on Pattern Recognition ( 2014pp 1173ndash1178

59 SAtildeullinger D Trung P Uhl A lsquoNon-reference image quality assessment andnatural scene statistics to counter biometric sensor spoofingrsquo IET Biometrics2018 7 (4) pp 314ndash324

60 Bhogal APS Soumlllinger D Trung P Uhl A lsquoNon-reference image qualityassessment for biometric presentation attack detectionrsquo 2017 5th InternationalWorkshop on Biometrics and Forensics (IWBF) 2017 pp 1ndash6

61 Boulkenafet Z Komulainen J Hadid A lsquoFace anti-spoofing based on colortexture analysisrsquo In 2015 IEEE International Conference on Image Processing(ICIP) ( 2015 pp 2636ndash2640


62 Boulkenafet Z Komulainen J Hadid A lsquoFace spoofing detection using colourtexture analysisrsquo IEEE Transactions on Information Forensics and Security2016 11 (8) pp 1818ndash1830

63 Boulkenafet Z Komulainen J Hadid A lsquoFace antispoofing using speeded-up robust features and fisher vector encodingrsquo IEEE Signal Processing Letters2017 24 (2) pp 141ndash145

64 Boulkenafet Z Komulainen J Akhtar Z Benlamoudi A Samai DBekhouche SE et al lsquoA competition on generalized software-based face pre-sentation attack detection in mobile scenariosrsquo In 2017 IEEE International JointConference on Biometrics (IJCB) ( 2017 pp 688ndash696

65 Raja KB Raghavendra R Busch C lsquoColor adaptive quantized patterns forpresentation attack detection in ocular biometric systemsrsquo In Proceedings of the9th International Conference on Security of Information and Networks SIN rsquo16(New York NY USA ACM 2016 pp 9ndash15

66 Boulkenafet Z Komulainen J Hadid A lsquoOn the generalization of colortexture-based face anti-spoofingrsquo Image Vision Comput 2018 77 pp 1ndash9

67 Anjos A Marcel S lsquoCounter-measures to photo attacks in face recognitionA public database and a baselinersquo In 2011 International Joint Conference onBiometrics (IJCB) ( 2011 pp 1ndash7

68 Komulainen J Hadid A PietikAtildedrsquoinen M Anjos A Marcel S lsquoComple-mentary countermeasures for detecting scenic face spoofing attacksrsquo In 2013International Conference on Biometrics (ICB) ( 2013 pp 1ndash7

69 Gragnaniello D Poggi G Sansone C Verdoliva L lsquoAn investigation of localdescriptors for biometric spoofing detectionrsquo IEEE Transactions on InformationForensics and Security 2015 10 (4) pp 849ndash863

70 de FreitasPereira T Anjos A DeMartino JM Marcel S lsquoLbp acircLŠ top basedcountermeasure against face spoofing attacksrsquo In Park JI Kim J editorsComputer Vision - ACCV 2012 Workshops (Berlin Heidelberg Springer BerlinHeidelberg 2013 pp 121ndash132

71 de FreitasPereira T Anjos A Martino JMD Marcel S lsquoCan face anti-spoofing countermeasures work in a real world scenariorsquo In 2013 InternationalConference on Biometrics (ICB) ( 2013 pp 1ndash8

72 MAtildedrsquoAtildedrsquottAtildedrsquo J Hadid A PietikAtildedrsquoinen M lsquoFace spoofing detectionfrom single images using micro-texture analysisrsquo In 2011 International JointConference on Biometrics (IJCB) ( 2011 pp 1ndash7

73 Chingovska I Anjos A Marcel S lsquoOn the effectiveness of local binary pat-terns in face anti-spoofingrsquo In 2012 BIOSIG - Proceedings of the InternationalConference of Biometrics Special Interest Group (BIOSIG) ( 2012 pp 1ndash7

74 Yang J Lei Z Liao S Li SZ lsquoFace liveness detection with componentdependent descriptorrsquo In 2013 International Conference on Biometrics (ICB) (2013 pp 1ndash6

75 Maatta J Hadid A Pietikainen M lsquoFace spoofing detection from singleimages using texture and local shape analysisrsquo IET Biometrics 2012 1 (1)pp 3ndash10

76 Benlamoudi A Samai D Ouafi A Bekhouche SE TalebAhmed A HadidA lsquoFace spoofing detection using local binary patterns and fisher scorersquo In 20153rd International Conference on Control Engineering Information Technology(CEIT) ( 2015 pp 1ndash5

77 Patel K Han H Jain AK lsquoSecure face unlock Spoof detection on smart-phonesrsquo IEEE Transactions on Information Forensics and Security 2016 11(10) pp 2268ndash2283

78 Schwartz WR Rocha A Pedrini H lsquoFace spoofing detection through partialleast squares and low-level descriptorsrsquo In 2011 International Joint Conferenceon Biometrics (IJCB) ( 2011 pp 1ndash8

79 Xu Z Li S Deng W lsquoLearning temporal features using lstm-cnn architec-ture for face anti-spoofingrsquo In 2015 3rd IAPR Asian Conference on PatternRecognition (ACPR) ( 2015 pp 141ndash145

80 Peixoto B Michelassi C Rocha A lsquoFace liveness detection under bad illu-mination conditionsrsquo In 2011 18th IEEE International Conference on ImageProcessing ( 2011 pp 3557ndash3560

81 Raghavendra R Busch C lsquoRobust scheme for iris presentation attack detec-tion using multiscale binarized statistical image featuresrsquo IEEE Transactions onInformation Forensics and Security 2015 10 (4) pp 703ndash715

82 Doyle JS Bowyer KW lsquoRobust detection of textured contact lenses in irisrecognition using bsifrsquo IEEE Access 2015 3 pp 1672ndash1683

83 Akhtar Z Micheloni C Piciarelli C Foresti GL lsquoMobio_livdet Mobilebiometric liveness detectionrsquo In 2014 11th IEEE International Conference onAdvanced Video and Signal Based Surveillance (AVSS) ( 2014 pp 187ndash192

84 Tronci R Muntoni D Fadda G Pili M Sirena N Murgia G et al lsquoFusionof multiple clues for photo-attack detection in face recognition systemsrsquo In 2011International Joint Conference on Biometrics (IJCB) ( 2011 pp 1ndash6

85 Feng L Po LM Li Y Xu X Yuan F Cheung TCH et al lsquoIntegra-tion of image quality and motion cues for face anti-spoofing A neural networkapproachrsquo Journal of Visual Communication and Image Representation 201638 pp 451 ndash 460

86 Li L Feng X Boulkenafet Z Xia Z Li M Hadid A lsquoAn original face anti-spoofing approach using partial convolutional neural networkrsquo In 2016 SixthInternational Conference on Image Processing Theory Tools and Applications(IPTA) ( 2016 pp 1ndash6

87 Nguyen TD Pham TD Baek NR Park KR lsquoCombining deep andhandcrafted image features for presentation attack detection in face recognitionsystems using visible-light camera sensorsrsquo In Sensors ( 2018

88 Atoum Y Liu Y Jourabloo A Liu X lsquoFace anti-spoofing using patch anddepth-based cnnsrsquo In 2017 IEEE International Joint Conference on Biometrics(IJCB) ( 2017 pp 319ndash328

89 Liu Y Jourabloo A Liu X lsquoLearning deep models for face anti-spoofingBinary or auxiliary supervisionrsquo CoRR 2018

90 Wang Z Zhao C Qin Y Zhou Q Lei Z lsquoExploiting temporal and depthinformation for multi-frame face anti-spoofingrsquo CoRR 2018

91 Yang J Lei Z Li SZ lsquoLearn convolutional neural network for face anti-spoofingrsquo CoRR 2014

92 Gan J Li S Zhai Y Liu C lsquo3d convolutional neural network based on faceanti-spoofingrsquo In 2017 2nd International Conference on Multimedia and ImageProcessing (ICMIP) ( 2017 pp 1ndash5

93 Lucena O Junior A Moia V Souza R Valle E de AlencarLotufo R InlsquoTransfer learning using convolutional neural networks for face anti-spoofingrsquo (2017 pp 27ndash34

94 Czajka A Bowyer KW Krumdick M VidalMata RG lsquoRecognition ofimage-orientation-based iris spoofingrsquo IEEE Transactions on Information Foren-sics and Security 2017 12 (9) pp 2184ndash2196

95 Silva P Luz E Baeta R Pedrini H FalcAtildeco A Menotti D lsquoAn approachto iris contact lens detection based on deep image representationsrsquo In 2015 28thSIBGRAPI Conference on Graphics Patterns and Images ( 2015 pp 157ndash164

96 Kuehlkamp A da SilvaPinto A Rocha A Bowyer KW Czajka AlsquoEnsemble of multi-view learning classifiers for cross-domain iris presentationattack detectionrsquo CoRR 2018

97 Hoffman S Sharma R Ross A lsquoConvolutional neural networks for iris pre-sentation attack detection Toward cross-dataset and cross-sensor generalizationrsquoIn 2018 IEEECVF Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW) ( 2018 pp 1701ndash17018

98 Menotti D Chiachia G Pinto A Schwartz WR Pedrini H FalcAtildeco AXet al lsquoDeep representations for iris face and fingerprint spoofing detectionrsquoIEEE Transactions on Information Forensics and Security 2015 10 (4) pp 864ndash879

99 Agarwal A Singh R Vatsa M lsquoFace anti-spoofing using haralick featuresrsquo2016 IEEE 8th International Conference on Biometrics Theory Applications andSystems (BTAS) 2016 pp 1ndash6

100 Galbally J Marcel S Fierrez J lsquoImage quality assessment for fake bio-metric detection Application to iris fingerprint and face recognitionrsquo IEEETransactions on Image Processing 2014 23 (2) pp 710ndash724

101 Garcia DC de Queiroz RL lsquoFace-spoofing 2d-detection based on moirAtildelrsquo-pattern analysisrsquo IEEE Transactions on Information Forensics and Security2015 10 (4) pp 778ndash786

102 He Z Sun Z Tan T Wei Z lsquoEfficient iris spoof detection via boosted localbinary patternsrsquo In Tistarelli M Nixon MS editors Advances in Biometrics(Berlin Heidelberg Springer Berlin Heidelberg 2009 pp 1080ndash1090

103 Zhang H Sun Z Tan T lsquoContact lens detection based on weighted lbprsquo In2010 20th International Conference on Pattern Recognition ( 2010 pp 4279ndash4282

104 CostaPazo A Bhattacharjee S VazquezFernandez E Marcel S lsquoThe replay-mobile face presentation-attack databasersquo In 2016 International Conference ofthe Biometrics Special Interest Group (BIOSIG) ( 2016 pp 1ndash7

105 Sequeira AF Monteiro JC Rebelo A Oliveira HP lsquoMobbio A multimodaldatabase captured with a portable handheld devicersquo In 2014 International Con-ference on Computer Vision Theory and Applications (VISAPP) vol 3 ( 2014pp 133ndash139

106 Gragnaniello D Sansone C Verdoliva L lsquoIris liveness detection for mobiledevices based on local descriptorsrsquo Pattern Recogn Lett 2015 57 (C) pp 81ndash87

107 Marsico MD Galdi C Nappi M Riccio D lsquoFirme Face and iris recognitionfor mobile engagementrsquo Image Vision Comput 2014 32 pp 1161ndash1172

108 Russakovsky O Deng J Su H Krause J Satheesh S Ma S et allsquoImagenet large scale visual recognition challengersquo International Journal ofComputer Vision 2015 115 (3) pp 211ndash252

109 Boulkenafet Z Komulainen J Li L Feng X Hadid A lsquoOulu-npu A mobileface presentation attack database with real-world variationsrsquo In 2017 12th IEEEInternational Conference on Automatic Face Gesture Recognition (FG 2017) (2017 pp 612ndash618

110 Yambay D Becker B Kohli N Yadav D Czajka A Bowyer KW et allsquoLivdet iris 2017 - iris liveness detection competition 2017rsquo In 2017 IEEE Inter-national Joint Conference on Biometrics IJCB 2017 Denver CO USA October1-4 2017 ( 2017 pp 733ndash741

111 Chakka MM Anjos A Marcel S Tronci R Muntoni D Fadda G et allsquoCompetition on counter measures to 2-d facial spoofing attacksrsquo In 2011International Joint Conference on Biometrics (IJCB) ( 2011 pp 1ndash6

112 Chingovska I Yang J Lei Z Yi D Li SZ Kahm O et al lsquoThe 2nd com-petition on counter measures to 2d face spoofing attacksrsquo In 2013 InternationalConference on Biometrics (ICB) ( 2013 pp 1ndash6

113 Zhang S Wang X Liu A Zhao C Wan J Escalera S et al lsquoA dataset andbenchmark for large-scale multi-modal face anti-spoofingrsquo In 2019 IEEECVFConference on Computer Vision and Pattern Recognition (CVPR) ( 2019pp 919ndash928

114 Liu A Wan J Escalera S JairEscalante H Tan Z Yuan Q et al lsquoMulti-modal face anti-spoofing attack detection challenge at cvpr2019rsquo In The IEEEConference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019

115 Liu A Tan Z Li X Wan J Escalera S Guo G et al lsquoStatic and dynamicfusion for multi-modal cross-ethnicity face anti-spoofingrsquo ( 2019

116 Yambay D Jr JSD Bowyer KW Czajka A Schuckers S lsquoLivdet-iris2013 - iris liveness detection competition 2013rsquo In IEEE International JointConference on Biometrics Clearwater IJCB 2014 FL USA September 29 -October 2 2014 ( 2014 pp 1ndash8

117 Yambay D Walczak B Schuckers S Czajka A lsquoLivdet-iris 2015 - iris live-ness detection competition 2015rsquo In IEEE International Conference on IdentitySecurity and Behavior Analysis ISBA 2017 New Delhi India February 22-242017 ( 2017 pp 1ndash6

118 Sequeira AF Oliveira HP Monteiro JC Monteiro JP Cardoso JSlsquoMobilive 2014 - mobile iris liveness detection competitionrsquo In Proceedingsof the International Joint Conference on Biometrics (IJCB) ( 2014 pp 1ndash6


119 RuizAlbacete V TomeGonzalez P AlonsoFernandez F Galbally J Fier-rez J OrtegaGarcia J lsquoDirect attacks using fake images in iris verificationrsquoIn Schouten B Juul NC Drygajlo A Tistarelli M editors Biometricsand Identity Management (Berlin Heidelberg Springer Berlin Heidelberg 2008pp 181ndash190

120 Szegedy C Vanhoucke V Ioffe S Shlens J Wojna Z lsquoRethinking theinception architecture for computer visionrsquo CoRR 2015 abs151200567

121 He K Zhang X Ren S Sun J lsquoDeep residual learning for image recogni-tionrsquo CoRR 2015 abs151203385

122 Sandler M Howard AG Zhu M Zhmoginov A Chen L lsquoInverted resid-uals and linear bottlenecks Mobile networks for classification detection andsegmentationrsquo CoRR 2018 abs180104381

123 Simonyan K Zisserman A lsquoVery deep convolutional networks for large-scaleimage recognitionrsquo CoRR 2014 abs14091556

124 Kingma DP Ba J lsquoAdam A method for stochastic optimizationrsquo CoRR 2014abs14126980

125 Anjos A Shafey LE Wallace R Guumlnther M McCool C Marcel S lsquoBoba free signal processing and machine learning toolbox for researchersrsquo In 20thACM Conference on Multimedia Systems (ACMMM) Nara Japan ( 2012

126 Anjos A Guumlnther M de FreitasPereira T Korshunov P MohammadiA Marcel S lsquoContinuously reproducing toolchains in pattern recognitionand machine learning experimentsrsquo In International Conference on MachineLearning (ICML) ( 2017

127 Peng F Qin L Long M lsquoFace presentation attack detection using guided scaletexturersquo Multimedia Tools and Applications 2018 77 (7) pp 8883ndash8909


to visualize what the network focuses on when deciding if a sampleis bona-fide or not

Hence the main contributions of this work include (1) Surveythe recent methods for presentation attack detection in both face andiris modalities with competitions and datasets (2) Demonstrate theeffectiveness of CNN for the PAD problem by finetuning and com-paring recent deep CNN architectures on 3 benchmark face datasetsand 3 benchmark iris datasets experimenting with two finetuningapproaches and applying cross-dataset evaluation (3) Train a sin-gle network for presentation attack detection of the two biometricmodalities and compare the results with networks trained on singlebiometric (4) Analyze specific features learned by the commonly-trained network and (5) Finally visualize the class activation mapsof the trained network on sample bona-fide and attack images

The rest of the paper is organized as follow we first reviewthe PAD approaches which can broadly be categorized into active(Section 2) vs passive (Section 3) PAD techniques Then inSection 4 we summarize the face and iris PAD competitions andbenchmark databases followed by Section 5 where we explainthe proposed PAD approach Experiments results and analysis arepresented in Section 6 and finally conclusion and future work inSection 7

2 Active PAD

Active Presentation Attack Detection can either be hardware-basedor depending on challenge-response both will be highlighted indetails in this section

21 Hardware-based

Such methods depend on capturing the automatic response of thebiometric trait (faceiris) to a certain stimulus For example in irisliveness detection the use of eye hippus which is the permanentoscillation that the eye pupil presents even under uniform lightingconditions or the dilation of pupil in reaction to sudden illuminationchanges [11ndash15] Variations of pupil dilation were calculated byBodade et al [16] from multiple iris images while Huang et al [17]used pupil constriction to detect iris liveness Other researchers usedthe iris image brightness variations after light stimuli in some prede-fined iris regions [18] Or the reflections of infrared light on the moistcornea when stimulated with light sources positioned randomly inspace Lee et al [19] used additional infrared sensors during irisimage acquisition to construct the full 3d shape of the iris to aid inthe detection of fake iris images or contact lenses These approachesrely on an external specific device to be added to the sensorcamerahence the naming hardware-based techniques However not allhardware-based methods are active some such approaches detectfacial thermogram [20 21] blood pressure fingerprint sweat gaitetc passively in a non-intrusive manner

22 Challenge-response

In these methods the user is required to respond to a challengeinstructed by the system The system requests the user to performsome specific movement like blink right eye or rotate the headclockwise [22ndash24] or gaze towards a predefined stimulus [25]

23 Drawbacks of active PAD methods

Although active spoof detection methods may provide higher pre-sentation attack detection rates and robustness against differentpresentation attack types they are more expensive and less con-venient to users than the software-based approaches which do notinclude any additional devices beside the standard camera For therest of this survey we will focus on software-based methods theyare more generic less expensive less intrusive and can easily beincorporated in real-world applications and smartphones Table 1summarizes the different approaches used in literature for detectionof presentation attacks

3 Passive Software-based PAD

Passive software-based antispoofing methods can be categorizedinto several groups Temporal or motion-analysis based approacheswhich utilize features extracted from consecutive frames capturedfor the face or iris Or texture-based methods that use single imageof the biometric trait and does not depend on motion features Suchtexture-based PAD methods can further be classified into either agroup that depends on hand-crafted features vs another group ofrecent methods that opt to learn these discriminative features

31 Liveness-detection

The earliest category include approaches that identify whether thepresented biometric is live or an artifact such approaches are knownas liveness-detection An artifact can be a mask or paper in caseof face spoofing and paper or a glass-eye for iris These methodsdepend on detecting signs of life in captured biometric data Exam-ples in the case of face PAD are eye blinking [24 26ndash29] facialexpression changes head andor mouth movements [30ndash32] Forthe iris PAD Komogortsev et al [33] extracted liveness cues fromeye movement in a simulated scenario of an attack by a mechani-cal replica of the human eye The authors further investigated in thisline and published more iris liveness detection methods based oneye movements and gaze estimation in later years [34ndash36]

Liveness-detection methods are considered temporal-based andcan be easily fooled by a replay video of the face or iris or by addingliveness properties to a still face image by cutting the eyemouthparts

32 Recapturing-detection contextual information

Another approach that belongs to the temporal-based category aremethods that simply detect if the biometric data is a replay of arecorded sample referred to as recapturing-detection These meth-ods analyze the scene and environment to detect abnormal move-ments in the video due to a hand holding the presenting 2D mobiledevice or paper [37 38] Such movements are different than move-ment of a real face with a static background Sometimes thesetechniques are referred to as motion-analysis depending on motioncues from planar objects Some use optical flow to capture and trackthe relative movements between different facial parts to determinespoofing [39 40] while other researches use motion magnifica-tion [27 41] Haralick features [99] or visual rhythm analysis [42]Other types depend on contextual analysis of the scene [22 30 4344] or 3d reconstruction [45]

The recapture detection methods fail if the attack is done using anartifact like a mask because no replay is present to detect So a betterand more generic approach is using texture-based methods whichutilize features that can tell the difference between real live biometricdata and a replay or an artifact being presented to the sensor

33 Texture-based (Hand-crafted features)

The category of texture-based methods assumes that the texturecolor and reflectance characteristics captured by genuine face imagesare different from those resulting from presentation attacks Forexample blur and other effects caused by printers and displaydevices lead to detectable color artifacts and texture patterns whichcan then be explored to detect presentation attacks

Researchers use handcrafted image feature extraction methods toextract image features from face or iris images They then use a clas-sification method such as support vector machines (SVM) to classifyimages into two classes of bona-fide or presentation attack based onthe extracted image features Such techniques are faster than tem-poral or motion-based analysis as they can operate with only oneimage sample or a selection of images instead of having to analyzea complete video sequence

331 Texture-reflectance These techniques analyze appear-ance properties of face or iris such as the face reflectance or texture









Liveness-detection
























34 Fusion



35 Smartphones




36 Learned-features








41 Competitions





























Replay-Attack(2012)




2 2 50(151520)

4 + 20 Train 60+300Dev 60+300Test 80+400

CASIA-FASD(2012)




1 1 50(20-30)

3 + 9 Train 60+180Test 90+270

MSU-MFSD(2014)




1 1 35(15-20)

2 + 6 Train 30+90Test 40+120

Replay-Mobile(2016)




5 2 40(121612)

10 + 16 Train 120+192Dev 160+256Test 110+192

OULU-NPU(2017)



3 3 55(201520)





1 1 165(90-75)


CASIA-SURF(2018)


1) PH 1) PR (A4) 1 11000(300100600)

1 + 6 Train900+5400Dev 300+1800Test1800+10800




1) PR2) VR

1 2 (2D)

0 60 4

1500(600300600)99 (0099)8 (008)

1 + 3

0 + 180 + 8

4 Protocols4500+135000+53460+196




(a) Laptop

(b) Android














(a) bona-fide

(b) Print attack













51 Preprocessing






parameters layers


56 448(12-crop)

239M 159


253 - 758 - 62M 88


28 - 98 - 35M 88

VGG16 [123] - 244 995 71 1384M 23




















61 Preprocessing


















63 Evaluation





























Train FaceDatabase


Replay-Attack


2 6 3 6 4 9 2 8


MSU-MFSD


4 6 3 5 10 9 7 9


Replay-Mobile


2 3 1 3 4 7 5 4







































































































































































































Liveness-detection
























34 Fusion



35 Smartphones




36 Learned-features








41 Competitions





























Replay-Attack(2012)




2 2 50(151520)

4 + 20 Train 60+300Dev 60+300Test 80+400

CASIA-FASD(2012)




1 1 50(20-30)

3 + 9 Train 60+180Test 90+270

MSU-MFSD(2014)




1 1 35(15-20)

2 + 6 Train 30+90Test 40+120

Replay-Mobile(2016)




5 2 40(121612)

10 + 16 Train 120+192Dev 160+256Test 110+192

OULU-NPU(2017)



3 3 55(201520)





1 1 165(90-75)


CASIA-SURF(2018)


1) PH 1) PR (A4) 1 11000(300100600)

1 + 6 Train900+5400Dev 300+1800Test1800+10800




1) PR2) VR

1 2 (2D)

0 60 4

1500(600300600)99 (0099)8 (008)

1 + 3

0 + 180 + 8

4 Protocols4500+135000+53460+196




(a) Laptop

(b) Android














(a) bona-fide

(b) Print attack













51 Preprocessing






parameters layers


56 448(12-crop)

239M 159


253 - 758 - 62M 88


28 - 98 - 35M 88

VGG16 [123] - 244 995 71 1384M 23




















61 Preprocessing


















63 Evaluation





























Train FaceDatabase


Replay-Attack


2 6 3 6 4 9 2 8


MSU-MFSD


4 6 3 5 10 9 7 9


Replay-Mobile


2 3 1 3 4 7 5 4

































































































































































































34 Fusion



35 Smartphones




36 Learned-features








41 Competitions





























Replay-Attack(2012)




2 2 50(151520)

4 + 20 Train 60+300Dev 60+300Test 80+400

CASIA-FASD(2012)




1 1 50(20-30)

3 + 9 Train 60+180Test 90+270

MSU-MFSD(2014)




1 1 35(15-20)

2 + 6 Train 30+90Test 40+120

Replay-Mobile(2016)




5 2 40(121612)

10 + 16 Train 120+192Dev 160+256Test 110+192

OULU-NPU(2017)



3 3 55(201520)





1 1 165(90-75)


CASIA-SURF(2018)


1) PH 1) PR (A4) 1 11000(300100600)

1 + 6 Train900+5400Dev 300+1800Test1800+10800




1) PR2) VR

1 2 (2D)

0 60 4

1500(600300600)99 (0099)8 (008)

1 + 3

0 + 180 + 8

4 Protocols4500+135000+53460+196




(a) Laptop

(b) Android














(a) bona-fide

(b) Print attack













51 Preprocessing






parameters layers


56 448(12-crop)

239M 159


253 - 758 - 62M 88


28 - 98 - 35M 88

VGG16 [123] - 244 995 71 1384M 23




















61 Preprocessing


















63 Evaluation





























Train FaceDatabase


Replay-Attack


2 6 3 6 4 9 2 8


MSU-MFSD


4 6 3 5 10 9 7 9


Replay-Mobile


2 3 1 3 4 7 5 4

























































































































































































































Replay-Attack(2012)




2 2 50(151520)

4 + 20 Train 60+300Dev 60+300Test 80+400

CASIA-FASD(2012)




1 1 50(20-30)

3 + 9 Train 60+180Test 90+270

MSU-MFSD(2014)




1 1 35(15-20)

2 + 6 Train 30+90Test 40+120

Replay-Mobile(2016)




5 2 40(121612)

10 + 16 Train 120+192Dev 160+256Test 110+192

OULU-NPU(2017)



3 3 55(201520)





1 1 165(90-75)


CASIA-SURF(2018)


1) PH 1) PR (A4) 1 11000(300100600)

1 + 6 Train900+5400Dev 300+1800Test1800+10800




1) PR2) VR

1 2 (2D)

0 60 4

1500(600300600)99 (0099)8 (008)

1 + 3

0 + 180 + 8

4 Protocols4500+135000+53460+196




(a) Laptop

(b) Android














(a) bona-fide

(b) Print attack













51 Preprocessing






parameters layers


56 448(12-crop)

239M 159


253 - 758 - 62M 88


28 - 98 - 35M 88

VGG16 [123] - 244 995 71 1384M 23




















61 Preprocessing


















63 Evaluation





























Train FaceDatabase


Replay-Attack


2 6 3 6 4 9 2 8


MSU-MFSD


4 6 3 5 10 9 7 9


Replay-Mobile


2 3 1 3 4 7 5 4










































































































































































































(a) bona-fide

(b) Print attack













51 Preprocessing






parameters layers


56 448(12-crop)

239M 159


253 - 758 - 62M 88


28 - 98 - 35M 88

VGG16 [123] - 244 995 71 1384M 23




















61 Preprocessing


















63 Evaluation





























Train FaceDatabase


Replay-Attack


2 6 3 6 4 9 2 8


MSU-MFSD


4 6 3 5 10 9 7 9


Replay-Mobile


2 3 1 3 4 7 5 4




































































































































































































51 Preprocessing






parameters layers


56 448(12-crop)

239M 159


253 - 758 - 62M 88


28 - 98 - 35M 88

VGG16 [123] - 244 995 71 1384M 23




















61 Preprocessing


















63 Evaluation





























Train FaceDatabase


Replay-Attack


2 6 3 6 4 9 2 8


MSU-MFSD


4 6 3 5 10 9 7 9


Replay-Mobile


2 3 1 3 4 7 5 4









































































































































































































61 Preprocessing


















63 Evaluation





























Train FaceDatabase


Replay-Attack


2 6 3 6 4 9 2 8


MSU-MFSD


4 6 3 5 10 9 7 9


Replay-Mobile


2 3 1 3 4 7 5 4


























































































































































































































Train FaceDatabase


Replay-Attack


2 6 3 6 4 9 2 8


MSU-MFSD


4 6 3 5 10 9 7 9


Replay-Mobile


2 3 1 3 4 7 5 4
































































































































































































deep convolutional neural networks for face and iris ...posed in literature for both face and iris...

Documents