1 chapter 15 system errors revisited ali erol 10/19/2005

1

Chapter 15System Errors Revisited

Ali Erol

10/19/2005

2

System Errors Revisited

• Quantify the accuracy of FAR and FRR estimates.

• Confidence Intervals, a well known technique used in statistical analysis.

• See references [22],[23].

• The first three author’s algorithm [23] experimentally demonstrated to provide better Confidence Intervals estimates.

3

FAR/FRR

• Definition:FRR(x)=Prob(smx/H0)=F(x)

FAR(y)=Prob(sn>y/Ha)=1-Prob(sn y/Ha)=1-G(y)

• We need– F(x)=Dist(x) : Genuine (Matching) score DF– G(y)= Dist(y): Imposter (Non-matching) score DF

4

FAR/FRR

• Instead we have– Set of genuine scores X={X1

, X2, …., XM}

– Set of imposter scores Y={Y1,Y2

, …., YN}

• We estimate

)(#1

)(FAR

)(#1

)(FRR

^

^

yYN

y

xXM

x

i

i

5

Problem

• What is the accuracy of these error rates?– The number of biometric samples– The quality of the samples

• Data collection procedure (e.g. 10 consecutive samples)

• Subjects involved, the acquisition device etc.

6

An Estimation Problem

Givenx: A random variable (F(x) denotes Dist(x))X={X1

, X2, …., XM}: Sample set

Estimate =E(x)

Solution

Error

M

iiX

MX

1

1

ˆr

(Unbiased estimator*)

7

Biased/Unbiased Estimators

• For an unbiased estimator we have

• Example: Gaussian Model: Estimate mean 1 and variance 2 using maximum likelihood criterion i.e. maximize Prob(X/ ,)

)/ˆ(E

M

iiX

M 11

1

21

12 )ˆ(

1ˆ

M

iiX

M

(Unbiased estimator)

21

12 )ˆ(

1

1ˆ

M

iiX

M (Unbiased estimator)

(Biased estimator) 22

1)/ˆ(

M

ME

θ

8

Confidence Interval

• Assume F(x) is given then Dist(r) can be calculated– r is function of , which is a function of x

• Calculate (1-) 100% certainty (Next Slide)r[1(,X), 2(,X)]

• Which leads to (1-)100% confidence interval for given by

)],(ˆ),,(ˆ[ 12 XX

9

Confidence Interval

• Example– Discard /2 on lower and higher ends– Find the r values corresponding to the interval

boundary (called quantile)

Dist(r)

r

Prob(q(/2) r q(1-/2))=1-

10

Confidence Interval

• Interpretation:– Generate sample sets X from F(x)– Calculate confidence intervals for each X– (1-)100% of these intervals contain .

11

Parametric Method

• Xi identically distributed

• Assume Xi are independent (not true in general)

• Then can be taken to be normal distribution using central limit theorem (large M).

• Result:

• E.g. For 95% confidence z=1.96• Smaller interval with increasing M and

)ˆ(Dist

M

iiX

MX

1

1

M

stddevzX

)()(

XM

VarVar

XE

)()ˆ(

)ˆ(

X

12

Non-Parametric Method

• Assume F(x) is available. Sample SetX

Additional Sample Sets

]...1[ Bi*i X

f(x)

Density of *X

Random Variable

*X

13

Non-Parametric Method

• FACT: For large B we have

• Define error to be

• Calculate Dist(r)

• Solution:

)(E)(E xX *

XXr *

))( (#1

)( * rXXB

rDist i

14

Non-Parametric Method• Interval calculation: Sorting and counting

Dist(r)

r

/2/2 ]B...1[ iX *i

B)2/(1 and B)2/(

)2/1( and )2/(

.......

21

**

B21

21

kk

XqXq

XXX*k

*k

***

15

Bootstrap Method

• F(x) is not available; all we have is X

• How do we generate ?

• Solution (i.e. Bootstrap method): Sampling with replacement from X.

• Put the samples in a bag, draw, record and put it back.

• Draw M samples from X B times. Some samples Xi may not be in each set.

*iX

16

Bootstrap Method (Imperfections)• Xi are not independent.

– In SR the dependence between samples is not replicated.

• Effect of dependence for independent samples – Variance of is smaller

– Leads to smaller CIs

*X

21

21

221

21

221

21

when4)(

tindependen when2)(

i.d mean zero ,

XXXEXXE

XEXXE

XX

/2/2 ]B...1[ iX *i

17

Subset Bootstrap

• Potential sources of dependency– All samples from the same person (e.g. multiple

fingers)– All samples from same biometric (e.g. finger)

• Partition X into independent subsets

• Apply SR on subsets.

18

Subset Bootstrap (An example)

• Fingerprint database– P persons– c fingers per person D=cP Fingers– d samples per finger– DB Size= cPd

• Matching pairs– d(d-1) per finger– cd(d-1) per person– cPd(d-1)=Dd(d-1) total

• Using a symmetric and asymmetric matcher does not make any difference [23].

19

Subset Bootstrap (An Example)

• X1 X2

• X1: P=10 c=2, D=20, d=8 M=1120• X2: P=50 c=2, D=100, d=8 M=5600• Finger based partition: Set subsets to be the

samples from the same finger (i.e. D subsets of d(d-1) matching scores)

• Person based partition: Set subsets to be the samples from the same person (i.e. P subsets of cd(d-1) matching scores)

20

Subset Bootstrap (An Example)

• We expect– CI1 (light gray) to be larger than CI2 (dark gray)

• Because X1 has smaller number of samples

– CI2 (dark gray) to be contained in CI1 (light gray)

• Because X1 X2

• The intervals are larger for person based partitioning– There is dependency between fingers of the same person

21

CIs for FAR/FRR• Calculate CIs for each

threshold T=t0 and given an

)(#1

)(FAR

)(#1

)(FRR

00

00

tYN

t

tXM

t

i

i

22

CI for FRR

• Given genuine score set X– Generate– Calculate– Sort and count

]...1[ Bi*i X

)(FRR 0* ti

B)2/(1 and B)2/(

]FRR,FRR[);(FRR

21

0 21

kk

t *k

*k

23

CI for FAR

• Given imposter score set Y– Generate– Calculate– Sort and count

]...1[ Bi*i Y

)(FAR 0* ti

B)2/(1 and B)2/(

]FAR,FAR[);(FAR

21

0 21

kk

t *k

*k

24

Subset Bootstrap for FAR

• Imposter scores Y are not independent• We are using multiple impressions of the same

finger.• Let Ixk: kth finger impression from subject x then

sim(Ia1,Ib1), sim(Ia1,Ib2), sim(Ia2,Ib3) are not statistically independent

• Use a finger only once; for D fingers we have only D/2 such pairs

• There is actually dependency between X and Y

25


• Fingerprint database– P persons– c fingers per person D=cP Fingers– d samples per finger– DB Size= cPd

• Non-matching pairs– N=d2D(D-1)=P[(dc)2(P-1)+d2c(c-1)]– d2(D-1) per finger– (dc)2(P-1)+d2c(c-1) per person

26


I1Ii IN

…. ….DB Partition

Ii

Y1=IixI1 YN-1=IixIN

• Finger (N=D): Take Ii (d elements), match it against Iki (d2 pairs) then we have d2(D-1) pairs. Repeat it with all Ii to construct subsets Yk

• Person (N=P): Take Ii (cd elements), match it against Iki ((dc)2 pairs) then we have (dc)2(P-1) pairs. Inside Ii we have d2c(c-1) pairs. Repeat it with all Ii to construct subsets Yk

• Not completely independent: We use Ii many times.

x

27

Subset Bootstrap for FRR

• Person subset is a better estimate

28

How good are the CIs?

• There exists a true confidence interval (At the beginning we assumed F(x) is known)

• The CI we calculate is just one estimate.

• How accurate is that estimate?

29


• We estimate E(x)

• Ideal Test: Assume F(x) is available– Generate – Calculate– Assume and test if

]...1[ Kkk X

kX

X)(E x CIkX

30


• Practical Test (for comparison)1. Randomly split X into two subsets Xa and Xb

2. Calculate and CIa

3. Test

4. Repeat 1-3 many times and count the number of hits i.e. probability of falling into the CIa

• Hit rate is not equal to the confidence. Assume have normal distribution.

• The higher the hit rate is the better the estimates are.

bX

ab CIX

bX

ab XX ,

31


=0.1• Person based partitioning provide more

accurate confidence intervals• 73.10% is very close to the expected value

1 chapter 15 system errors revisited ali erol 10/19/2005

Documents

slide r

samples x i

distx x

function of x

parametric method x

sample sets x

genuine scores x

x b times