sabah jassim university of buckingham, uk

BioSecure & COST 2101 – Smart Cards and Biometric – Lausanne, 2007

Sabah JassimUniversity of Buckingham, UK.

SecurePhoneA Multi-Modal Biometric Verifier for constrained

devices

BioSecure + COST 2101 - March 2007

Outline The SecurePhone project

Fusion approaches to biometric-based Identification SecurePhone multi-Modal Biometric verifier

• PDA Implementation Constraints• Modalities• Fusion strategy

Performance: Match on Host (Moh) & Mach on Card (Moc)

Challenges and Potential solutions

Conclusion


The SecurePhone Project Aims to produce a prototype of a new mobile communication

system enabling biometrically authenticated users to deal legally binding m-contracts during a mobile phone call in an easy yet highly dependable and secure way using a biometric recogniser that fuses face, voice and handwritten signature. The SP consortuim

http://www.atosorigin.com/

http://www.informacro.info/

http://www.secure-phone.info/www.int-evry.fr

http://www.secure-phone.info/www.nergal.it

http://www.telefonicamoviles.com/

http://www.coli.uni-saarland.de/


SecurePhone aim 1: secure exchange

Secure PKI (Public Key Infrastructure)

Deal secure m-contracts during a mobile phone call• secure: private key stored on SIM card

• user-friendly: intuitive, non-intrusive• flexible: legally binding text/audio

transactions• dynamic: mobile e-signing “on the fly”


Project aim 2: biometric Project aim 2: biometric verificationverification

preprocessing

modellingmodellingmodelling

preprocessingpreprocessing

face voice signature

accept userrelease private keyreject user

fusionclient &

impostor joint-score

models

Zero-Knowledge Authentication.


Implementation constraints• PDA main processor is such slower processing power

than PC. Thus even on PDA verification must be very efficient.

• Inadequate Audio-Visual signal sample rate using the device applications (only 8 kHz for audio and 10 fps video). Succeeded to improved. Current SP sampling and real time pre-processing is 22 kHz audio and 20 fps video signals.

• Only data on the SIM is secure, so must store and process the biometric models/templates on the SIM. Yet the SIM has very limited computational resources and processing supportSIM model storage is limited to 40 K: text-dependent promptsNote: text-independent prompts or varied text-dependent prompts are more secure, but would require 200-400 K.

• Enrolment should be based on a short session (acceptability)


Voice verification (SU / GET ENST)• Fixed 5-digits prompt – conceptually neutral, easily

extendable, requires few Gaussians• 22 KHz sampling• Online energy based non-speech frame removal• MFCCs with online CMS and first time difference

features – slow to compute, but fixed point faster than floating point

• Features modelled by 100-Gaussian GMM pdf, with UBM for model initialisation and score normalisation

• Training on data from 2 indoor and 2 outdoor recordings from one session. Testing on similar data from another session


Signature verification (GET INT)• 2D coordinates (100 Hz) augmented by time difference

features, curvature, etc. – total 19 featuresNote: no pressure or angles available, since obtained from

PDA’s touch screen, not from writing pad • Shift normalisation, but no rotation or scaling• Features modelled by 100 Gaussian GMM pdf – UBM

used for model initialisation and score normalisation• Fast to compute• Training and testing on data from one session


Face Wavelet feature Representation (BU)

The Discrete Wavelet Transform (DWT) decomposes an image into a set of different frequency subbands with different resolutions, each consisting of

At a resolution depth of k, the pyramidal scheme decomposes an image I into 3k + 1 subbands: (LLk, HLk, LHk, HHk, . . . , HL1, LH1, HH1).The lowest-pass subband LLk represents the k-level resolution approximation of the image I. The subbands HL1, LH1, and HH1 contain finest scale wavelet coefficients, and the coefficients get coarser as k increases, LLk being the coarsest.

Each subband of DWT-decomposed face image represents the person’s face at different frequency ranges and different scales (i.e. a distinct stream for face recognition with varying accuracy rates that can be fused for improved accuracy).


Face verification (BU)• Static face recognition – 10 grey-scale images

selected at random from a video, face area 160x192 pixels

• Histogram equalisation and z-score standardisation of features are applied as simple fast light normalisation.

• Haar wavelet low-low-4 (or low-high) subband as feature vectorsOther wavelet filters were tested but Haar is the fastest to compute

• Features modelled by only 4 Gaussian GMM pdf – UBM used for model initialisation and score normalisation

• Training on data from 2 indoor and 2 outdoor recordings from one session, testing on similar data from another session


Fusion (GET INT)

• For each modality S(i) = log p(Xi|C) - log p(Xi|I)

• Score fusion was tested by:

• Optimal linear weighted sum:Fused-scores = w(i) * S(i)

sum is taken over the 3 modalities

• GMM scores modelling, i.e. modelling both client and impostor joint score pdf’s by diagonal covariance GMMs:Fused-score = log p(S|C) - log p(S|I)


User verification system

• User requests PDA to verify their identity

• PDA requests user to •read prompt (face in box)•sign signature

• Feature processing applied to each modality[silence removal, histogram equalisation, MFCC or Haar wavelets, online CMS, delta features, etc.]

• for each modality S(i)=log p(Xi|C)-log p(Xi|I)

• if S(i) < θ(i) for any (i) please repeatelse fused-score = log p(S|C) - log p(S|I)

• if fused-score > φ user acceptedelse user rejected

Press to start/stop speaking

7 9 8 5 1

start/stop


Speaking face & Forgery (GET ENST)

• Investigated possible attacks and forgery scenarios:

• using synthesised voice and face

Difficult to create – synchronisation problems

• Replay attacks – devised a successful attack whereby the client voice and face images but not the same video.

Used coupled HMM for voice and face reduced greatly the effect of this attack.


PDA Database (PDAtabase)

• After initial development with many databases [TIMIT(V), CSLU(V), BANCA(V,F), ORL(F), BIOMET(V,F,S), NIST(V)]

• CSLU/BANCA-like database recorded on Qtek2020 PDA for realistic conditions (sensors, environment)

• 60 English subjects: 24 for UBM, 18 for g1, 18 for g2. Accept/reject threshold optimised on g1evaluated on g2, vice versa

• Video (voice + face): 18 prompts from (5-digit, 10-digit and phrase);3 sessions, with 2 inside and 2 outside recordings per session

• Signatures in one session, 20 expert impostorisation for each• Virtual couplings of audio-visual with signature data (independent)• Automatic test script allows to test many possible configuration• User just provides executables for feature modelling, scores generation

and scores fusion


Match on Host (MoH): complementarity of modalities

Modality 5 digits 10 digitsVoice (V) 6.1 3.4Face (F) 28.6 29.9Signature (S) 6.2 6.2

V + F 4.8 3.0V + S 1.1 0.7S + F 4.8 4.7V + F + S 0.9 0.6

Result table with improved results for 5-digit and 10-digit prompts in PDAtabase (SPIE 2006)

For LL subband.

Already have improved

results for LH subband!


Match on Card (MoC)

Implementation of the MoH system on the SIMcard (MoC) No problem in terms of storage But is not feasible because of verification time

(matching plus host/SIM communication = one hour )A reduction of the verification time can be attained by reducing the vector size reducing the frame rate reducing the number of Gaussians of the client and

background modelsMatching time was still not acceptable


MoC bottleneck

Not in preprocessing, since this is still all done on the PDA, as in the MoH system.

Not in face: Although feature vectors are Only a few (10) of them in testing and only 4 Gaussians needed (client model and UBM)

Bottleneck caused by voice and signature data: Vectors are relatively small, large number of frames large number of Gaussians


MoC solutionOnly a drastic measure can solve the problem: Globalised features:

Features to represent the whole signature: a single vector of 41 parameters representing correlation and variation in x-y coordinates, velocity and acceleration parameters

Idea generalized to voice: use of means (cf. Long-Term Average Spectrum) and standard deviations per vector parameters across all frames

Works well for signature Improvement:

use up to four equal subparts of signature/voice signal Implementation: 2 equal subparts


MoC-emulated results

EER (percent) for globalised means (columns 2-5) and means plus standard deviations (columns 6-9) for voice and sinature divided into

two equal subparts

Global feat.

Means only

Means only

Means only

Means only

Means + sd

Means + sd

Means + sd

Means + sd

#Gauss. 1 2 4 8 1 2 4 8

Voice 22.13 21.09 20.87 21.86 20.88 19.72 17.68 18.49

Face 32.26 31.78 29.06 29.19 32.26 31.78 29.06 29.19

Signature 38.29 27.58 22.58 17.86 28.14 22.16 17.59 16.45

Fused 12.89 12.48 10.49 9.32 12.56 10.48 8.28 9.15


Solving the capacity problem

Possible options for improving performance of the SecurePhone: Use match-on-server (MoS) - Security and privacy concern. Implement the Biometric Recognizer and Encryption on a

chip (more costly than current solution) Build a secure PDA with sufficient storage and processing

power (A dedicated device that would be more costly and less ubiquitous).

Split matching (hybrid MoC/MoH) considered but not implemented. Initial work is being done and results are encouraging. Promising implications for security and privacy of biometrics data (templates/models)without cryptography.


Conclusion and Future Work• Natural, non-intrusive biometrics guarantee high user acceptance • Biometric data never leave the SIM-card. High security • Fusion of Multi-streams of single trait can lead to improved in

performance (A pilot for Face was tested but not implemented in SP)• MoH is efficient with high accuracy, but vulnerable. • MoC is secure, efficiency and high accuracy cannot happen together!

Future work include: Designing hybrid mixed client-server matching. Investigating the privacy and security of Biometric data, using

Cancellable Biometrics, specially for “Match on Server” Improving performance of single modalities through the multi-

classifier & multi-stream strategies. e.g. Face by mixing larger number of subbands at different depths


AcknowledgementThanks to EU for funding this research through the

SecurePhone (IST-2002-506883) project.

sabah jassim university of buckingham, uk

Documents