1 chapter 15 system errors revisited ali erol 10/19/2005
TRANSCRIPT
2
System Errors Revisited
• Quantify the accuracy of FAR and FRR estimates.
• Confidence Intervals, a well known technique used in statistical analysis.
• See references [22],[23].
• The first three author’s algorithm [23] experimentally demonstrated to provide better Confidence Intervals estimates.
3
FAR/FRR
• Definition:FRR(x)=Prob(smx/H0)=F(x)
FAR(y)=Prob(sn>y/Ha)=1-Prob(sn y/Ha)=1-G(y)
• We need– F(x)=Dist(x) : Genuine (Matching) score DF– G(y)= Dist(y): Imposter (Non-matching) score DF
4
FAR/FRR
• Instead we have– Set of genuine scores X={X1
, X2, …., XM}
– Set of imposter scores Y={Y1,Y2
, …., YN}
• We estimate
)(#1
)(FAR
)(#1
)(FRR
^
^
yYN
y
xXM
x
i
i
5
Problem
• What is the accuracy of these error rates?– The number of biometric samples– The quality of the samples
• Data collection procedure (e.g. 10 consecutive samples)
• Subjects involved, the acquisition device etc.
6
An Estimation Problem
Givenx: A random variable (F(x) denotes Dist(x))X={X1
, X2, …., XM}: Sample set
Estimate =E(x)
Solution
Error
M
iiX
MX
1
1
ˆr
(Unbiased estimator*)
7
Biased/Unbiased Estimators
• For an unbiased estimator we have
• Example: Gaussian Model: Estimate mean 1 and variance 2 using maximum likelihood criterion i.e. maximize Prob(X/ ,)
)/ˆ(E
M
iiX
M 11
1
21
12 )ˆ(
1ˆ
M
iiX
M
(Unbiased estimator)
21
12 )ˆ(
1
1ˆ
M
iiX
M (Unbiased estimator)
(Biased estimator) 22
1)/ˆ(
M
ME
θ
8
Confidence Interval
• Assume F(x) is given then Dist(r) can be calculated– r is function of , which is a function of x
• Calculate (1-) 100% certainty (Next Slide)r[1(,X), 2(,X)]
• Which leads to (1-)100% confidence interval for given by
)],(ˆ),,(ˆ[ 12 XX
9
Confidence Interval
• Example– Discard /2 on lower and higher ends– Find the r values corresponding to the interval
boundary (called quantile)
Dist(r)
r
Prob(q(/2) r q(1-/2))=1-
10
Confidence Interval
• Interpretation:– Generate sample sets X from F(x)– Calculate confidence intervals for each X– (1-)100% of these intervals contain .
11
Parametric Method
• Xi identically distributed
• Assume Xi are independent (not true in general)
• Then can be taken to be normal distribution using central limit theorem (large M).
• Result:
• E.g. For 95% confidence z=1.96• Smaller interval with increasing M and
)ˆ(Dist
M
iiX
MX
1
1
M
stddevzX
)()(
XM
VarVar
XE
)()ˆ(
)ˆ(
X
12
Non-Parametric Method
• Assume F(x) is available. Sample SetX
Additional Sample Sets
]...1[ Bi*i X
f(x)
Density of *X
Random Variable
*X
13
Non-Parametric Method
• FACT: For large B we have
• Define error to be
• Calculate Dist(r)
• Solution:
)(E)(E xX *
XXr *
))( (#1
)( * rXXB
rDist i
14
Non-Parametric Method• Interval calculation: Sorting and counting
Dist(r)
r
/2/2 ]B...1[ iX *i
B)2/(1 and B)2/(
)2/1( and )2/(
.......
21
**
B21
21
kk
XqXq
XXX*k
*k
***
15
Bootstrap Method
• F(x) is not available; all we have is X
• How do we generate ?
• Solution (i.e. Bootstrap method): Sampling with replacement from X.
• Put the samples in a bag, draw, record and put it back.
• Draw M samples from X B times. Some samples Xi may not be in each set.
*iX
16
Bootstrap Method (Imperfections)• Xi are not independent.
– In SR the dependence between samples is not replicated.
• Effect of dependence for independent samples – Variance of is smaller
– Leads to smaller CIs
*X
21
21
221
21
221
21
when4)(
tindependen when2)(
i.d mean zero ,
XXXEXXE
XEXXE
XX
/2/2 ]B...1[ iX *i
17
Subset Bootstrap
• Potential sources of dependency– All samples from the same person (e.g. multiple
fingers)– All samples from same biometric (e.g. finger)
• Partition X into independent subsets
• Apply SR on subsets.
18
Subset Bootstrap (An example)
• Fingerprint database– P persons– c fingers per person D=cP Fingers– d samples per finger– DB Size= cPd
• Matching pairs– d(d-1) per finger– cd(d-1) per person– cPd(d-1)=Dd(d-1) total
• Using a symmetric and asymmetric matcher does not make any difference [23].
19
Subset Bootstrap (An Example)
• X1 X2
• X1: P=10 c=2, D=20, d=8 M=1120• X2: P=50 c=2, D=100, d=8 M=5600• Finger based partition: Set subsets to be the
samples from the same finger (i.e. D subsets of d(d-1) matching scores)
• Person based partition: Set subsets to be the samples from the same person (i.e. P subsets of cd(d-1) matching scores)
20
Subset Bootstrap (An Example)
• We expect– CI1 (light gray) to be larger than CI2 (dark gray)
• Because X1 has smaller number of samples
– CI2 (dark gray) to be contained in CI1 (light gray)
• Because X1 X2
• The intervals are larger for person based partitioning– There is dependency between fingers of the same person
21
CIs for FAR/FRR• Calculate CIs for each
threshold T=t0 and given an
)(#1
)(FAR
)(#1
)(FRR
00
00
tYN
t
tXM
t
i
i
22
CI for FRR
• Given genuine score set X– Generate– Calculate– Sort and count
]...1[ Bi*i X
)(FRR 0* ti
B)2/(1 and B)2/(
]FRR,FRR[);(FRR
21
0 21
kk
t *k
*k
23
CI for FAR
• Given imposter score set Y– Generate– Calculate– Sort and count
]...1[ Bi*i Y
)(FAR 0* ti
B)2/(1 and B)2/(
]FAR,FAR[);(FAR
21
0 21
kk
t *k
*k
24
Subset Bootstrap for FAR
• Imposter scores Y are not independent• We are using multiple impressions of the same
finger.• Let Ixk: kth finger impression from subject x then
sim(Ia1,Ib1), sim(Ia1,Ib2), sim(Ia2,Ib3) are not statistically independent
• Use a finger only once; for D fingers we have only D/2 such pairs
• There is actually dependency between X and Y
25
Subset Bootstrap for FAR
• Fingerprint database– P persons– c fingers per person D=cP Fingers– d samples per finger– DB Size= cPd
• Non-matching pairs– N=d2D(D-1)=P[(dc)2(P-1)+d2c(c-1)]– d2(D-1) per finger– (dc)2(P-1)+d2c(c-1) per person
26
Subset Bootstrap for FAR
I1Ii IN
…. ….DB Partition
Ii
Y1=IixI1 YN-1=IixIN
• Finger (N=D): Take Ii (d elements), match it against Iki (d2 pairs) then we have d2(D-1) pairs. Repeat it with all Ii to construct subsets Yk
• Person (N=P): Take Ii (cd elements), match it against Iki ((dc)2 pairs) then we have (dc)2(P-1) pairs. Inside Ii we have d2c(c-1) pairs. Repeat it with all Ii to construct subsets Yk
• Not completely independent: We use Ii many times.
x
28
How good are the CIs?
• There exists a true confidence interval (At the beginning we assumed F(x) is known)
• The CI we calculate is just one estimate.
• How accurate is that estimate?
29
How good are the CIs?
• We estimate E(x)
• Ideal Test: Assume F(x) is available– Generate – Calculate– Assume and test if
]...1[ Kkk X
kX
X)(E x CIkX
30
How good are the CIs?
• Practical Test (for comparison)1. Randomly split X into two subsets Xa and Xb
2. Calculate and CIa
3. Test
4. Repeat 1-3 many times and count the number of hits i.e. probability of falling into the CIa
• Hit rate is not equal to the confidence. Assume have normal distribution.
• The higher the hit rate is the better the estimates are.
bX
ab CIX
bX
ab XX ,