accuracy evaluation of handwritten signature verification ...rossarun/pubs/galbally... · accuracy...

9
Accuracy evaluation of handwritten signature verification: Rethinking the random-skilled forgeries dichotomy Javier Galbally European Commission, Joint Research Center (DG JRC), Italy [email protected] Marta Gomez-Barrero Hochschule Darmstadt, da/sec - Biometrics and Internet Security Research Group, Germany [email protected] Arun Ross Michigan State University, i-Probe Lab, USA [email protected] Abstract Traditionally, the accuracy of signature verification sys- tems has been evaluated following a protocol that con- siders two independent impostor scenarios: random forg- eries and skilled forgeries. Although such an approach is not necessarily incorrect, it can lead to a misinterpreta- tion of the results of the assessment process. Furthermore, such a full separation between both types of impostors may be unrealistic in many operational real-world applications. The current article discusses the soundness of the random- skilled impostor dichotomy and proposes complementary approaches to report the accuracy of signature verification systems, discussing their advantages and limitations. 1. Introduction Among the various biometric modalities that have been considered so far for automatic personal recognition, hand- written signature has been given a significant amount of at- tention, with an expansive literature extending from the pi- oneering studies over 30 years ago to date [10, 12, 15, 16]. This strong interest is explained by the fact that, for cen- turies, handwritten signature has been used extensively to certify the authorship of documents. As such, nowadays, signatures are generally recognized as a legal means of ver- ifying an individual’s identity in many administrative and financial interactions. Furthermore, signature acquisition is a fast and simple process with which users are extremely fa- miliarized, either by the traditional ‘ink and paper’ method or the more recent electronic process using existing pointer- based devices (e.g., touch screens, pen tablets or mobile phones). Many of the advantages of handwritten signature as a means for personal recognition arise from the fact that it is a biometric characteristic that we learn to produce. How- ever, such a behavioural dimension also has a downside: it implies that signature is more vulnerable to forgery than physical characteristics (i.e., biometric characteristics that are implicit, like fingerprint or iris). Through training, an attacker can also learn to produce the signature of a differ- ent individual. Therefore, in the context of signature veri- fication, two different impostor scenarios have been tradi- tionally distinguished to evaluate system accuracy [5]: Random impostors: also known as zero-effort impos- tors or intrinsic failure. This refers to the case in which the attacker does not possess any knowledge about the genuine signature and presents her own signature, while claiming the genuine subject’s identity. Skilled impostors: In this scenario, the attacker has some knowledge of the genuine biometric character- istic and presents a signature that imitates this. Tradi- tionally, this scenario has been considered only in the context of behavioural biometric characteristics. Such “skilled forgeries” are usually more similar to the gen- uine user’s signature than random forgeries, thereby impacting the recognition accuracy and posing a real challenge to signature-based biometric systems. It should be noted that, depending on the knowledge level that the attacker has of the original signature, differ- ent skilled impostor scenarios can be distinguished, for in- 302 2017 IEEE International Joint Conference on Biometrics (IJCB) 978-1-5386-1124-1/17/$31.00 ©2017 IEEE

Upload: others

Post on 21-Mar-2020

26 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accuracy evaluation of handwritten signature verification ...rossarun/pubs/Galbally... · Accuracy evaluation of handwritten signature verification: Rethinking the random-skilled

Accuracy evaluation of handwritten signature verification:Rethinking the random-skilled forgeries dichotomy

Javier GalballyEuropean Commission, Joint Research Center (DG JRC), Italy

[email protected]

Marta Gomez-BarreroHochschule Darmstadt, da/sec - Biometrics and Internet Security Research Group, Germany

[email protected]

Arun RossMichigan State University, i-Probe Lab, USA

[email protected]

Abstract

Traditionally, the accuracy of signature verification sys-tems has been evaluated following a protocol that con-siders two independent impostor scenarios: random forg-eries and skilled forgeries. Although such an approach isnot necessarily incorrect, it can lead to a misinterpreta-tion of the results of the assessment process. Furthermore,such a full separation between both types of impostors maybe unrealistic in many operational real-world applications.The current article discusses the soundness of the random-skilled impostor dichotomy and proposes complementaryapproaches to report the accuracy of signature verificationsystems, discussing their advantages and limitations.

1. Introduction

Among the various biometric modalities that have beenconsidered so far for automatic personal recognition, hand-written signature has been given a significant amount of at-tention, with an expansive literature extending from the pi-oneering studies over 30 years ago to date [10, 12, 15, 16].This strong interest is explained by the fact that, for cen-turies, handwritten signature has been used extensively tocertify the authorship of documents. As such, nowadays,signatures are generally recognized as a legal means of ver-ifying an individual’s identity in many administrative andfinancial interactions. Furthermore, signature acquisition isa fast and simple process with which users are extremely fa-miliarized, either by the traditional ‘ink and paper’ methodor the more recent electronic process using existing pointer-

based devices (e.g., touch screens, pen tablets or mobilephones).

Many of the advantages of handwritten signature as ameans for personal recognition arise from the fact that it isa biometric characteristic that we learn to produce. How-ever, such a behavioural dimension also has a downside:it implies that signature is more vulnerable to forgery thanphysical characteristics (i.e., biometric characteristics thatare implicit, like fingerprint or iris). Through training, anattacker can also learn to produce the signature of a differ-ent individual. Therefore, in the context of signature veri-fication, two different impostor scenarios have been tradi-tionally distinguished to evaluate system accuracy [5]:

• Random impostors: also known as zero-effort impos-tors or intrinsic failure. This refers to the case in whichthe attacker does not possess any knowledge aboutthe genuine signature and presents her own signature,while claiming the genuine subject’s identity.

• Skilled impostors: In this scenario, the attacker hassome knowledge of the genuine biometric character-istic and presents a signature that imitates this. Tradi-tionally, this scenario has been considered only in thecontext of behavioural biometric characteristics. Such“skilled forgeries” are usually more similar to the gen-uine user’s signature than random forgeries, therebyimpacting the recognition accuracy and posing a realchallenge to signature-based biometric systems.

It should be noted that, depending on the knowledgelevel that the attacker has of the original signature, differ-ent skilled impostor scenarios can be distinguished, for in-

302

2017 IEEE International Joint Conference on Biometrics (IJCB)978-1-5386-1124-1/17/$31.00 ©2017 IEEE

Page 2: Accuracy evaluation of handwritten signature verification ...rossarun/pubs/Galbally... · Accuracy evaluation of handwritten signature verification: Rethinking the random-skilled

stance [2]: 1) only the name of the subject is known, 2) onlythe image of the signature is known, 3) the image and thedynamics of the signature are known. In the present work,following the usual evaluation protocol used in the vast ma-jority of the signature literature, no distinction will be madeamong skilled impostor access attempts regardless of theknowledge level used.

Generally, the two impostor scenarios described abovehave been treated and evaluated as independent cases, thatis, the performance of a system is reported either in the ran-dom scenario or in the skilled scenario. Furthermore, sys-tem comparison is undertaken in a single scenario, withoutconsidering the other. This has been the case also in thecompetitions organized thus far in the context of signaturerecognition [9, 17].

This evaluation methodology, considering two inde-pendent scenarios, assumes that the biometric applicationknows a priori the type of impostors (i.e., random orskilled) it will be confronted with. Such an approach in-evitably leads to a situation where systems tend to optimizetheir accuracy for just one of the two possible scenarios,thus leading in many cases to a poor performance on theother one. For instance, two different decision thresholdsmay be selected for each of the impostor classes. Simi-larly, the feature extraction module may be specifically de-signed to optimize performance for either of the two impos-tor cases. The question that follows this practice is: Doesthis evaluation methodology realistically represent all pos-sible use-cases?

It may be argued that, under many operational condi-tions, it is not possible to know beforehand the type ofsignature (i.e., genuine, random forgery or skilled forgery)that will be used to access the system. As a consequence,scenario-specific optimization may not be feasible. Thisraises different questions that are difficult to be addressedfollowing the usual evaluation methodology based on twoindependent impostor scenarios, for instance: What is theaccuracy of a system if it cannot be known in advancewhether it will deal with random or skilled forgeries? Inthis case, what decision threshold should be selected?

Given the discussion above, we feel that the indepen-dent evaluation of the two impostor scenarios is insufficientto assess the efficacy of signature verification systems de-ployed in real-world applications. Accordingly, there is aneed to rethink the traditional evaluation scheme and to pro-pose alternative approaches that can complement it.

Following this main objective, the present article dis-cusses different possibilities to report accuracy results inhandwritten signature verification, which have been in-spired by recent works and research carried out in the ac-tive field of biometric presentation attack detection (PAD)[8, 3, 6]. In fact, the skilled forgeries scenario in signaturerecognition can be understood as a particular case of bio-

metric presentation attack (PA) that is performed against abehavioural biometric characteristic. Such behavioural-PAhas been referred to in some cases as mimicry. Accordingly,it seems reasonable to apply the lessons learned in the eval-uation of vulnerabilities to presentation attacks, to the caseof signature verification.

The rest of the article is organized as follows. Sect. 2establishes the link between presentation attacks to physi-cal biometric characteristics and the particular case of be-havioural characteristics (e.g., signature). In Sect. 3 wepresent three different possibilities to report accuracy re-sults in signature verification and we discuss their advan-tages and limitations. Finally, in Sect. 4 we present a sum-mary of the proposed methodologies and some general bestpractices.

2. Link to presentation attacksOver the last decade, plenty of attention has been given

to the analysis of threats to biometric systems and the waysin which the resulting vulnerabilities should be evaluated[6]. Among these threats, extensive research has been con-ducted in the field of presentation attacks (PA). In the spe-cialized literature [1], PAs are widely classified as impos-tor or concealer attacks. In the impostor case, a syntheticforged version of a genuine characteristic (e.g., gummy fin-ger, face mask, printed iris image) is presented to the sensorof a biometric system in order to impersonate a subject.

The skilled impostors scenario in signature verificationcan be understood as a particular case of impostor PAs, insome cases referred to as mimicry, in the context of be-havioural biometric characteristics. In fact, there are sev-eral similarities between the skilled impostors scenario andphysical biometric presentation attacks: 1) the attack is per-formed at the sensor level outside the digital domain; 2) theattacker tries to access the system by copying the genuine’ssubject characteristic; 3) it does not involve any manipula-tion, overriding or hacking of the system.

Given that skilled signature forgeries and impostor PAscan be considered within the same class of attacks, it seemsnatural to apply equivalent methodologies to the evaluationof both threats and to the reporting of the vulnerability re-sults. For this reason, in the following sections we willfollow the naming convention that is becoming standard inthe field of biometric presentation attack detection (PAD)[1], as we think that a unified nomenclature and evaluationapproach for presentation attacks on both physical and be-havioural characteristics (i.e., mimicry) can be beneficialfor the two areas. This way, following [1], the classicalrandom impostor scenario in signature recognition will bereferred to as bona fide (BF) scenario, while the skilled im-postor scenario will be referred to as presentation attack(PA) scenario.

Although a formal comparison between physical presen-

303

Page 3: Accuracy evaluation of handwritten signature verification ...rossarun/pubs/Galbally... · Accuracy evaluation of handwritten signature verification: Rethinking the random-skilled

PHYSICAL BIOMETRIC CHARACTERISTIC

PRESENTATION ATTACK DETECTION

COMPARATOR

SENSOR

PAD

sensor level

(hardware-based)

Enrolled

Reference

Decision

FEATURE EXTRACTOR

PAD

feature level

(software-based)

BONA

FIDE

PRESENTATION

ATTACK

Non-mated

Mated

Bona fide presentation

Presentation attack

Bona fide presentation

Presentation attack

COMPARATOR

SENSOR

PAD

sensor level

(hardware-based)

Enrolled

Reference

FEATURE EXTRACTOR

PAD

feature level

(software-based)

Decision

BONA

FIDE

PRESENTATION

ATTACK

Non-mated

Mated

BEHAVIOURAL BIOMETRIC CHARACTERISTIC

PRESENTATION ATTACK DETECTION

Figure 1. Diagram showing the differences to detect presentation attacks between physical and behavioural biometric systems.

tation attacks and mimicry falls out of the scope of thepresent article, it is worth highlighting here one key pointthat distinguishes both of them: while a traditional presen-tation attack involves the use of some physical artifact, inthe case of mimicry the interaction with the sensor is ex-actly the same followed in a bona fide access attempt, asshown in Fig. 1. Due to this factor mimicry poses a higherrisk than traditional physical PAs:

• On the one hand, since no artifact is required, mimicryattacks are very simple to carry out and difficult toidentify even for human supervisors.

• On the other hand, the artifact used in a physical PAcan be potentially detected at the sensor level or at thefeature level by means of some of the many automaticPAD techniques that have been proposed in the liter-ature [6] (also referred to as anti-spoofing or livenessdetection methods).

This is not the case for mimicry, where the develop-ment of this type of protection approaches is a greatchallenge. Given the peculiarities of mimicry, whereno physical artefact is needed, only feature level PADmethods are feasible (see Fig. 1). In fact, their po-tential implementation and performance has only beenvery preliminary studied [7].

This way, in the case of signature, the detection of pre-sentation attacks (i.e., skilled forgeries) fully depends on thecapabilities of the standard modules present in the recogni-tion system (i.e., feature extractor and comparator). That is,the decision on whether or not the submitted signature is aforgery is solely based on the comparison score. While, incontrast, in biometric applications based on physical char-acteristics (e.g., fingerprint, iris, face), a specific PAD mod-ule is usually added to enhance the protection capabilities ofthe overall system. The output of this PAD module is used,on its own or in combination with the comparison score, totake a decision on the authenticity of the input sample.

Consequently, mimicry attacks are usually taken into ac-count in the protocols followed for standard accuracy eval-uations. Physical PA, on the contrary, is usually assessed inthe framework of specific vulnerability studies. This is why,in signature, even more than in other biometric modalities,it is especially important to have clear evaluation protocolsregarding skilled and random forgeries, including propermethodologies for reporting the final results.

3. Methods for reporting accuracy results insignature verification

In traditional accuracy evaluations of biometric systems,two sets of scores are considered [13]:

304

Page 4: Accuracy evaluation of handwritten signature verification ...rossarun/pubs/Galbally... · Accuracy evaluation of handwritten signature verification: Rethinking the random-skilled

1. Bona Fide (BF) mated scores, also referred to in theliterature as genuine scores. They are produced after agenuine access attempt, that is, the genuine user triesto access the system using his own biometric charac-teristic.

2. Bona Fide (BF) non-mated scores, also referred to aszero-effort impostor scores in the general biometric lit-erature, or random impostor scores in the specific fieldof signature recognition. These scores are the resultof a zero-effort (i.e., random) impostor access attempt,that is, the impostor tries to access the system using hisown biometric characteristic claiming to be a differentsubject.

The addition of the presentation attack dimension bringsanother variable to the assessment of biometric systems per-formance. In this case, a third set of scores must be consid-ered:

3. Presentation Attack (PA) mated scores, referred to asskilled impostor scores in the field of signature recog-nition. These scores are produced by PA impostor ac-cess attempts (i.e., skilled impostor access attempts),that is, the impostor tries to access the system apply-ing a presentation attack (e.g., in the particular case ofsignature recognition, by actively trying to forge hissignature).

The question to be addressed when the PA dimensionis introduced is: how should these three sets of scores belinked in order to report results in the most meaningful man-ner, so that systems may be compared in a fair and objectiveway. The answer to that problem is not straightforward. Asa consequence, different methodologies and metrics havebeen proposed over the last years to assess the “spoofabil-ity” of biometric systems [3].

The next subsections discuss three possible methods ofreporting accuracy results for the particular case of signa-ture verification, their advantages and limitations, and thescenarios in which they can best be applied.

All the results presented in the following sections as il-lustrative examples were obtained using a state of the art on-line signature verification system based on Dynamic TimeWarping, described in [14]. Comparison scores were ob-tained on the BiosecurID DB signature subcorpus [4].

3.1. Method 1: Independent BF­PA scenarios

As mentioned in the introduction, this is currently themost extended method to report results in the field of hand-written signature recognition, including recent competitions[9, 17]. It distinguishes two independent evaluation scenar-ios:

• Bona fide (BF) scenario. This is the scenario consid-ered in classic accuracy evaluations [13]. It only takesinto account bona fide scores (either mated or non-mated). In this scenario, accuracy is typically reportedin terms of the FNMR (False Non-Match Rate, ratioof BF mated access attempts wrongly rejected) and theFMR (False Match Rate, ratio of BF non-mated accessattempts wrongly accepted). The working point whereboth the FNMR and the FMR take the same value is theEqual Error Rate (EER) and is generally accepted as agood estimation of the overall accuracy of the system.The decision threshold for the EER will be referredhere as δEER.

• Presentation attack (PA) scenario. In this scenario,access attempts are either BF mated or PA mated. Thetwo metrics that are widely used for reporting resultsin this scenario in the PA-related literature [11, 3] arethe the FNMR (defined as in the bona fide scenario)and the IAPMR (Impostor Attack Presentation MatchRate, corresponding to the ratio of presentation attackswrongly accepted). The point where the FNMR isequal to the IAPMR is generally referred to as thePresentation Attack Equal Error Rate (PAEER). Thedecision threshold for the EER will be named hereδPAEER.

The method used to report results in this case is to con-sider two independent plots, one for each scenario: 1) onefigure showing the pair FNMR-FMR and 2) a different fig-ure where the FNMR-IAPMR pair is depicted. An exampleis shown in the top chart of Fig. 2. It is important to noticethat, δEER ̸= δPAEER. That is, the IAPMR correspondingto δEER is not the PAEER, which is a very common mis-take made in the interpretation of the results, induced by theway in which they are presented.

This is not an incorrect approach of reporting results,since the main information about the system accuracy maybe extracted from the two plots (i.e., FNMR-FMR andFNMR-IAPMR). In fact, this method is fully acceptable forthose cases in which it is known beforehand whether thesystem will have to process random (i.e., zero-effort impos-tors) or skilled forgeries (i.e., PA impostors). For instance,this is the case of systems that are designed to operate inthe forensic field, where forged documents (e.g., checks)are the most usual use-case. In this particular context, itis known a priori whether impostors are skilled imitations(i.e., presentation attacks) or random imitations (i.e., zero-effort attacks). In such cases, since two different settings(e.g., decision thresholds) may be selected for the systemdepending on the forgeries being considered, results shouldbe reported as two independent error rate tuples or plots:FNMR-FMR (BF scenario) and FNMR-IAPMR (PA sce-nario).

305

Page 5: Accuracy evaluation of handwritten signature verification ...rossarun/pubs/Galbally... · Accuracy evaluation of handwritten signature verification: Rethinking the random-skilled

!

!

!!

!

!!

!

!!

Figure 2. Score distributions, FNMR, FMR, IAPMR and JFMR curves corresponding to the three methods considered in the present workfor reporting results in the accuracy evaluation of signature verification systems: independent method (top), joint method (middle) andlinked method (bottom). The JFAR curve has been plotted for probabilities [Pzam = 0.5, Ppam = 0.5]. All score distributions wereproduced with a DTW-based on-line signature verification system.

Below we present the main advantages and limitationsof this method of reporting results considering independentbona fide and presentation attack scenarios:

• Advantages:

- Two independent plots, one for each scenario, makes thevisual comparison among several systems easier for each ofthose two scenarios.- This is the method that should be followed in those spe-

cific contexts in which different system settings (e.g., deci-sion threshold) may be selected depending on the impostorscenario (i.e., zero-effort or PA).

• Limitations:

- Following this method it is not straight forward to comparethe systems performance in a situation where both type ofimpostors, zero-effort and PA, are present at the same time.- With this method of reporting results, it is not trivial to

306

Page 6: Accuracy evaluation of handwritten signature verification ...rossarun/pubs/Galbally... · Accuracy evaluation of handwritten signature verification: Rethinking the random-skilled

find the vulnerability of a system to presentation attacks(i.e, IAPMR) once a decision threshold has been fixed inthe bona fide scenario. Or the other way around, find theFMR when the threshold has been fixed in the PA scenario.- It may lead to the wrong assumption that, in most cases, adifferent setting of the system can be selected at verificationtime depending on the impostor scenario (i.e., zero-effort orPA).- Similarly, when used within a competition, this methodto present results will usually lead to the design of systemsspecifically adjusted to give an optimal accuracy in one ofthe scenarios, regardless of its accuracy on the other one.

To address these shortcomings, two other methods, de-scribed next, may be used to report results.

3.2. Method 2: Joint Impostor Scenario

In many real applications, it is not possible to distinguisha priori between BF non-mated and PA mated access at-tempts. In these cases, the method assuming independentscenarios described in Sect. 3.1 does not accurately repre-sent the practical operating conditions.

For these contexts, an additional possibility to report re-sults in signature verification is to make no difference be-tween both types of impostors, i.e., zero-effort and PA. Inessence, this means that the two impostor score distribu-tions, BF non-mated and PA mated, are merged into oneunique distribution that will be referred to as joint impostorscores. Compared to the previous method where three dif-ferent score distributions were considered (i.e., BF mated,BF non-mated and PA mated), in this case only two arecomputed: BF mated and joint impostors.

It follows that only one tuple of error rates FNMR-JFMRis computed, where JFMR stands for Joint False Match Rateand accounts both for zero-effort and PA access attemptswrongly accepted by the system. A graphical example ofthe two score distributions (i.e., genuine and joint impos-tors) together with the FNMR-JFMR curves is shown in themiddle chart of Fig. 2.

Although this may seem the most realistic way to presentresults, it fails to report some valuable information. Spe-cially at the development stage, it is useful to know thenumber of zero-effort and PA impostors that are wronglyaccepted, as this allows to: 1) understand the weak pointsof a system in order to improve it; 2) select the decisionthreshold depending on the expected number of access at-tempts of each type. For instance, if the number of total PAaccess attempts is expected to be much lower than that ofzero-effort access attempts, we may select a different oper-ating point than in a scenario where the two types of attacksare comparable in number.

Similarly, as mentioned in the independent BF-PA sce-narios described in Sect. 3.1, in the case of a multiple-

system evaluation, it is helpful to have information regard-ing the accuracy of the tested algorithms in the two impos-tor scenarios, that is, to have access to both the FMR andIAPMR (instead of only to the JFMR). This can lead to amore accurate comparison among systems in order to selectone of them depending on the final environment where itwill be installed (e.g., forensic field vs access control).

It is also worth noting that, in a scenario where the twovalues FMR-IAPMR are available, the JFMR may be di-rectly computed from them. Let’s assume that the proba-bility that a score from the total JFMR distribution comesfrom a BF non-mated access attempt (i.e., zero-effort im-postor) is Pzam. Similarly Ppam denotes the probabilitythat a given impostor access attempt is a presentation at-tack (where Pzam + Ppam = 1). Then, the JFMR at agiven operating point may be computed by simply doingJFMR = FMR·Pzam+IAPMR·Ppam. As such, the FMRof a system may be understood as a special case of JFMRfor [Pzam = 1, Ppam = 0]. Similarly, IAPMR=JFMR for[Pzam = 0, Ppam = 1]

The previous reasoning introduces another limitation ofpresenting the JFMR of the system as the unique error met-ric against impostor access attempts: a single JFMR curveis only valid for a given pair of values [Pzam, Ppam]. There-fore, JFMR should always be reported together with thesetwo probability values. Should the ratio of zero-effort andPA access attempts change, JFMR would have to be recom-puted. In the particular case of Fig. 2 (center), the JFMRcurve was plotted for probabilities [0.5,0.5].

Following the discussion given above, the main advan-tages and limitations of the Joint Impostor Scenario for re-porting results in signature verification are:

• Advantages:

- It is the closest method to a real-world environment whereit is not known the type of access attempts that the systemwill have to face. Usually designers only have an estima-tion of the expected Pzam and Ppam in that specific context.Therefore, the JFMR can provide a good prediction of howthe accuracy in laboratory conditions may translate to a realcontext.

• Limitations:

- Since BF non-mated and PA mated access attempts (pro-duced by zero-effort and PA impostors) are not distinguish-able, the Joint Impostor Scenario makes it more difficult toanalyze and compare the performance of several systemsand to extract valuable conclusions to improve them.- It is also more difficult to dynamically select a proper de-cision threshold depending on the expected number of BFnon-mated and PA mated access attempts.- The JFMR is specific for a certain ratio of BF non-matedvs PA mated access attempts, therefore lacking generality.

307

Page 7: Accuracy evaluation of handwritten signature verification ...rossarun/pubs/Galbally... · Accuracy evaluation of handwritten signature verification: Rethinking the random-skilled

For each different ratio, a different JFMR curve must beproduced. This is time consuming and cumbersome for thefast comparison of different systems.

The bottom line of the discussion above is that, althoughit is probably the most realistic method, reporting resultsfrom both impostor scenarios jointly can conceal valuableinformation for the development of a system or for the com-parison of different algorithms. However, as explained inSect. 3.1, presenting results of both zero-effort and PA sce-narios independently may lead to biased evaluations and tounrealistic accuracy values. For these reasons, a third com-plementary method to report results in signature verificationis presented next.

3.3. Method 3: Linked BF­PA scenarios

This method is inspired by similar approaches proposedto report vulnerabilities in the field of presentation attacksto physical characteristics. Different works related to pre-sentation attacks have discussed the doubtful advisability ofconsidering only PA access attempts to report the vulnera-bility of systems to presentation attacks [3]. The sensibleargument put forward in those studies in order to proposealternative ways to measure “spoofability” is that, as al-ready addressed in the present work, biometric systems arenormally adjusted to give top performance in the bona fidescenario. Furthermore, in most cases it is not possible to se-lect a different threshold for the bona fide and presentationattack scenarios. Therefore, a more realistic method to es-timate their resilience to presentation attacks is to give theIAPMR value at specific operating points fixed in the bonafide scenario.

Directly comparing the EER and the PAEER of a systemto determine its accuracy degradation between both scenar-ios, entails a change in the operating point from δEER toδPAEER. We should not forget that this threshold is oneof the key boundary conditions to ensure a fair comparisonamong systems and should not be modified. As argued inthe previous sections, in most cases, it is not possible tohave one decision threshold for zero-effort forgeries and adifferent one to deal with PA forgeries, because that wouldentail knowing beforehand whether a given subject will signwith his own genuine signature or will try to forge a differ-ent one. As a consequence, one single threshold has to beused for both scenarios. Assuming that such threshold isδEER, the question is, how many skilled forgeries will bewrongly accepted? The answer to that question is certainlynot the PAEER (as is wrongly assumed in many cases).

As an illustrative example, let’s take for instance the spe-cific signature verification system depicted in Fig. 2, whereEER = 1.8% and PAEER = 6.8%. Stating that theaccuracy degradation between the bona fide and presenta-tion attack scenarios is 100× (PAEER−EER)/EER =500%, is at least very arguable (as the system is not be-

ing compared on the same operating point). Rather, a bet-ter practice is to compare the FMR and the IAPMR at afixed threshold, for instance, δEER (see Fig. 2 bottom). Inthis case the degradation would be 100 × (IAPMREER −FMREER)/FMREER = 3, 540%.

The previous argumentation, together with the draw-backs of the methods presented in Sects 3.1 and 3.2, leadto a third methodology to report results in signature ver-ification, where both the BF and PA scenarios are linkedtogether.

In this case, errors are presented plotting in the samefigure all three FNMR-FMR-IAPMR curves, as shown inFig. 3 left. Similarly, in a DET plot, the curves for both sce-narios are presented together (see Fig. 3 right). This way, itis clearly highlighted that the decision threshold is uniqueand, at the same time, it allows a faster and more accuratecomparison among the two scenarios for one specific sys-tem.

Following the discussion given above, the main advan-tages and limitations of the Linked BF-PA scenarios for re-porting results in signature verification are:

• Advantages:

- It allows for a clearer and easier comparison between theaccuracy in the BF and PA scenarios for a specific system.- It presents comprehensive information from both BF andPA access attempts separately, aiding this way the perfor-mance analysis in both cases. This information can be veryvaluable for the development and improvement of systems.- If needed, this method still allows to compute theJFMR. For this purpose, an estimation of the expected[Pzam, Ppam] should be known.- By showing the accuracy of both scenarios together, thecomparison of different configurations of a given system foreach type of impostors is avoided (e.g., it is not possible toselect a different decision threshold for each of the impostorscenarios).

• Limitations:

- The visual comparison between several systems can bemore difficult than in the previous two reporting methods,as both DET curves of all systems are plotted in the samefigure. A table such as the one presented in Table 1 canbe a good tool to complement the visual plots, in order toprovide an easier benchmark across systems.

4. Summary and recommendationsThe present work has discussed the method usually fol-

lowed to report accuracy results in signature verification,has pointed out its limitations, and has proposed other ap-proaches that can complement the current trend.

308

Page 8: Accuracy evaluation of handwritten signature verification ...rossarun/pubs/Galbally... · Accuracy evaluation of handwritten signature verification: Rethinking the random-skilled

Figure 3. Example of visual plots for the “Linked BF-PA scenarios” method described in Sect. 3.3: (left panel) score distributions, FNMR,FMR and IAPMR curves; (right panel) corresponding DET curves. The EER, PAEER and IAPMREER are highlighted in both panels.

Table 1. Table showing the FMR, IAPMR and JFMR for differentfixed operating points in terms of their FNMR. All the values cor-respond to the real on-line signature verification system shown inFig. 2. The JFMR is computed for a probability of BF non-matedand PA mated attempts equal to [Pzam = 0.9, Ppam = 0.1].

DTW-based system - Results

FNMR 0.1% 0.5% 1% 5% 10%

FMR 35.5% 26.9% 16.3% 0.0% 0.0%IAPMR 53.4% 49.1% 46.9% 10.6% 2.4%JFMR [0.9-0.1] 37.3% 29.1% 19.3% 1.1% 0.2%

Most of the problems encountered in the task of correctlyevaluating, reporting and comparing the accuracy of sig-nature verification algorithms, derive from the inclusion ofthe presentation attack dimension. Presentation attack is acritical parameter, specially in the case of behavioural bio-metrics, due to their intrinsic forgeable nature. Therefore,any method for accuracy evaluation in signature recognitionshould unavoidably consider the case of PA access attempts.

The argumentation developed in the article has shownthat there is no unique perfect solution to present accu-racy assessment results and that the methodology selected ishighly dependent on the specific context in which the eval-uation takes place. Accordingly, it is of the utmost impor-tance to define: 1) the extent and purpose of the evaluation,2) the way in which results are presented and 3) how theyshould be interpreted.

Some possible general guidelines on best practices to re-port performance results in signature verification are as fol-lows:

• Use the method “Linked BF-PA scenarios”: To find theaccuracy degradation between the bona fide and pre-sentation attack scenarios for one system.

• Use the method “Independent BF-PA scenarios”: Tovisually compare the accuracy of several systems ineither the bona fide scenario or the presentation attackscenario. Without making any link between them.

• Use the method “Joint impostor scenario”: To comparethe accuracy of one or several systems in a close-to-operational context.

For any of the three contexts defined above, it is impor-tant to highlight that the common metric to all possible re-porting alternatives is the FNMR (as the genuine scores donot vary among the BF and PA scenarios). Therefore, itis advisable that, for the evaluated systems, all other er-ror rates (i.e., FMR, IAPMR and JFMR) are computed infixed operating points defined in terms of the FNMR, e.g.,FNMR=[0.1%, 0.5%, 1%, 5%, 10%]. Then numeric resultscan be presented in a comparative table such as Table 1.This type of table is a powerful aiding tool to report resultsin any of the contexts described above, in order to comple-ment the usual visual plots shown in Figs. 2 and 3.

Furthermore, this complementary table is of especial rel-evance when the objective of the evaluation is to fairly com-pare a number of systems considering both impostor sce-narios at the same time, as such a comparison is not easy toachieve visually following the linked-scenarios method.

As a wrap-up conclusion we may state that, indepen-dently of the approach used, probably the most importantfactor for reporting accuracy experiments in signature ver-ification is to be aware of the limitations of each strategyand to be able to correctly interpret the results being shown.This will help extract meaningful conclusions, specially inthe framework of competitive evaluations where differentsystems are compared.

309

Page 9: Accuracy evaluation of handwritten signature verification ...rossarun/pubs/Galbally... · Accuracy evaluation of handwritten signature verification: Rethinking the random-skilled

5. AcknowledgementsM.G-B is supported by the German Federal Ministry of

Education and Research (BMBF) as well as by the HessenState Ministry for Higher Education, Research and the Arts(HMWK) within the Center for Research in Security andPrivacy (CRISP, www.crisp-da.de).

References[1] ISO/IEC 30107-1: 2016 information technology - biometric

presentation attack detection - part 1: Framework.[2] F. Alonso-Fernandez, J. Fierrez, et al. Robustness of signa-

ture verification systems to imitators with increasing skills.In Proc. ICDAR, pages 728–732, 2009.

[3] I. Chingovska, A. Anjos, and S. Marcel. Biometrics evalua-tion under spoofing attacks. IEEE TIFS, 9:2264–2276, 2014.

[4] J. Fierrez, J. Galbally, , et al. BiosecurID: a multimodal bio-metric database. Pattern Analysis and Applications, 13:235–246, 2009.

[5] J. Fierrez and J. Ortega-Garcia. Handbook of Biomet-rics, chapter On-line signature verification, pages 189–209.Springer, 2008.

[6] J. Galbally, S. Marcel, and J. Fierrez. Biometric anti-spoofing methods: A survey in face recognition. IEEE Ac-cess, 2:1530–1552, 2014.

[7] M. Gomez-Barrero, J. Galbally, et al. Enhanced on-line sig-nature verification based on skilled forgery detection usingsigma-lognormal features. In Proc. ICB, pages 501–506,2015.

[8] A. Hadid, N. Evans, et al. Biometrics systems under spoofingattack. IEEE Signal Processing Magazine, 32:20–30, 2015.

[9] N. Houmani, A. Mayoue, et al. BioSecure signature eval-uation campaign (BSEC-2009):evaluating online signaturealgorithms depending on the quality of signatures. PatternRecognition, 45:993–1003, 2012.

[10] D. Impedovo and G. Pirlo. Automatic signature verification:The state of the art. IEEE TSMC, Part C: Applications andReviews, 38(5):609–635, 2008.

[11] E. M. Johnson et al. Increase the security of multibiometricsystems by incorporating a spoofing detection algorithm inthe fusion mechanism. In Proc. MCS, pages 309–318, 2011.

[12] F. Leclerc and R. Plamondon. Automatic signature verifi-cation: the state of the art,1989-1993. IJPRAI, 8:643–660,1993.

[13] A. Mansfield and J. Wayman. Best practices in testing andreporting performance of biometric devices. Technical re-port, CESG Biometrics Working Group, 2002.

[14] M. Martinez-Diaz, J. Fierrez, et al. Mobile signature ver-ification: Feature robustness and performance comparison.IET Biometrics, 3:267–277, 2014.

[15] R. Plamondon and G. Lorette. Automatic signature verifica-tion and writer identification - the state of the art. PatternRecognition, 22:107–131, 1989.

[16] R. Plamondon and S. N. Srihari. On-line and off-line hand-writing recognition: A comprehensive survey. IEEE TPAMI,22:63–84, 2000.

[17] D. Y. Yeung, H. Chang, et al. SVC2004: First InternationalSignature Verification Competition. In Proc. ICBA, pages16–22, 2004.

310