keystroke biometric authentication system

Spring 2009

Team MembersSpring '09 Team Members

Alpha Amatya

James Aliperti

Thomas Mariutto

Ankoor Shah

Michael Warren

Spring 2009 FocusWeb Based Authentication

continued development of the test-taker authentication application

Development of New Conceptsweighted & unweighted top n choices

strong vs. weak enrollmentModify Existing & Write New Programs

to simulate various scoring proceduresRun Experiments

to produce various scenarios

Reasons for Study Keystroke Biometrics is one of the least studied Biometrics

Applications used for user authentication

Most studies use short input; passwords & user names

This study focuses on long text input – Free/Copy

Typing characteristics are said to be:

1) Unique to an individual

2) Difficult to duplicate

Very important for online test taking systems

Important for overall Computer System Security

No special equipment is needed

Contents of SystemA PHP Website registers the user.

A modified Java applet captures 300 keystrokes and produces two files: a raw data file and a text file.

A Java program, BioFeature++, extracts 230 feature measurements.

A Java program, Biometric Authentication System (BAS), performs authentication tests.

4 Quadrant Data Collection36 Subjects 4 Quadrants

5 samples per quadrant

Types of Data Collected Copy Text Free Text

Entry Modes Desktop Laptop

Desktop Laptop

Copy Desktop Copy Laptop Copy

FreeText

Desktop Free Laptop Free

Keystroke Biometric Authentication System (Data Flow)

Raw Data Sample

DeliverablesRecreate authentication experiment from Keystroke Book Chapter Rewrite user and technical manualsModify classifier program to produce top n Within/Between choice and

distancesCreate 1st, 3rd and 5th Nearest Neighbor output tablesCreate output file of top 3 choices from Classifier program and obtain FRR,

FAR and PerformanceCreate ROC curves for each of the 4 quadrant data samplesRun two small-training, strong-enrollment authentication experimentsRun big-training, strong-enrollment authentication experiments,

incrementally increase training sizesWrite detailed descriptions of data formats Investigate discrepancy between 230 and 239 Linguistic-model features

Hierarchical Fallback ModelsTouch-type Model-- based on keys struck by touch typists

254 distinctive measurements

Linguistic Model- -based on language and most frequently used keys

230 distinctive measurements

Increased performance results found utilizing the Linguistic Model

Experimental RecreationCondition

Intra-Inter

Class Sizes

FRR FAR Performance

Train Test

DeskCopy 180-3825 180-3825 11.1% 6.0% 93.8%

LapCopy 180-3825 180-3825 7.8% 4.4% 95.5%

DeskFree 171-3570 176-3740 28.4% 1.4% 97.4%

LapFree 180-3825 180-3825 15.6% 3.7% 95.7%

Average 15.7% 3.7% 95.6%Condition

Intra-Inter

Class Sizes

FRR FAR Performance

Train Test

DeskCopy 180-3825 180-3825 2.78% 2.1% 97.9%

LapCopy 180-3825 180-3825 3.3% 4.0% 96.0%

DeskFree 171-3570 165-3576 21.0% 1.1% 98.0%

LapFree 180-3825 180-3825 10.0% 3.3% 96.4%

Average 9.3% 2.6% 97.1%

Experimental Recreation limiting sample size to 500

Condition

Intra-Inter

Class Sizes

FRR FAR Performance

Train Test

DeskCopy 180-500 180-500 2.78% 4.6% 95.9%

LapCopy 180-500 180-500 2.2% 7.2% 94.1%

DeskFree 176-500 165-500 10.2% 2.2% 95.7%

LapFree 180-500 180-500 6.7% 7.0% 93.1%

Average 5.5% 2.6% 94.7%

Top n=10 W/B Choices And Distance

The implementation compared each sample from the dichotomized test data with every sample from the dichotomized train data.

The shortest Euclidean distance was taken for n=10 choices .

This distance and the choice class , Within (W) or Between (B) was recorded.

This program was run for all four quadrants.Each output contained 180 Within + 3825 Between =

4005 choice tables.

Cross Section of Output File

Overall Accuracy For n=10 Output File using 1st,3rd,& 5st Nearest Neighbors Implemented a program to check overall accuracy on the

outputs created in Deliverable 3.Calculated FRR, FAR and performance for all the

experiments in the 4 quadrants.Precisely matched Deliverable 1 outputs using 1-Nearest

Neighbor, thus proving our experiments are carried out precisely and accurately.

Resulted in a slight improvement using 3 & 5 nearest neighbors as expected.

1st and 3rd Nearest Neighbor ChoiceConditions Within

CorrectWithin Wrong

Within Total

Between Correct

Between Wrong

Between Total

FRR FAR Performance

Desktop Copy 175 5 180 3744 81 3825 2.78% 2.12% 97.85%

Laptop Copy 174 6 180 3671 154 3825 3.33% 4.03% 96.00%

Desktop Free 139 37 176 3699 41 3740 21.02% 1.10% 98.01%

Laptop Free 162 18 180 3699 1236 3825 10.0% 3.29% 96.40%

Conditions Within Correct

Within Wrong

Within Total

Between Correct

Between Wrong

Between Total

FRR FAR Performance

Desktop Copy 173 7 180 3776 49 3825 3.89% 1.28% 98.60%

Laptop Copy 172 8 180 3720 105 3825 4.44% 2.75% 97.18%

Desktop Free 127 49 176 3731 9 3740 27.84% 0.24% 98.52%

Laptop Free 156 24 180 3735 90 3825 13.33% 2.35% 97.15%

1st, 3rd and 5th Nearest NeighborWithin and Between Choices derived using Euclidean

distance

95

95.5

96

96.5

97

97.5

98

98.5

99

99.5

100

DeskCopy LapCopy DeskFree LapFreeCondition

Perc

ent

Acc

urac

y

1-NN

3-NN

5-NN

Receiver Operating Characteristics (ROC curve)

Graphical representation of FAR and FRRFAR- False Acceptance Rate

authenticating an imposter

FRR- False Rejection Rate rejecting a valid user

Top n Nearest Neighbor ResponsesUnweighted

each output choice counted equallyWeighted

first output choice (more valuable) is scored higher

Receiver Operating Characteristics (ROC curve) Implementation: Weighted

Taking n=10 W/B choice output file as input, authenticated a user if 1 or more of the 10 choices is Within(W).

Each match was scored using the formulascore +=(10-j+1)

where score: 0 ->55 , j: 1->10 & choice =WMaximum score = 55Minimum score = 0FRR, FAR for i=0 -> 55 was calculated and ROC plotted.

Receiver Operating Characteristics (ROC curve) Implementation: Unweighted

Taking n=10 W/B choice output file as input, authenticated a user if 1 or more of the 10 choices is Within(W).

Each match was scored using the formulascore +=1

where score: 0 ->10 , j: 1->10 & choice =WMaximum score = 10Minimum score = 0FRR, FAR for i=0 -> 10 was calculated and ROC plotted.

Laptop Copy ROC Curve

01234567891011121314151617181920

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

FRR (%)

FA

R (

%) UNWEIGHTED

WEIGHTED

Desktop Copy ROC Curve

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

FRR (%)

FA

R (

%)

UNWEIGHTED

WEIGHTED

Laptop Free ROC Curve

01234567891011121314151617181920

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

FRR (%)

FA

R (

%) UNWEIGHTED

WEIGHTED

Desktop Free ROC Curve

01234567891011121314151617181920

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

FRR (%)

FA

R (

%)

UNWEIGHTED

WEIGHTED

2 big training, strong enrollment authentication experiments

Train on 36 subjects and test on 18 subjects

Test Test Size Train Train

Size FRR FAR Performance

Lap Free

180-3825

Lap Copy

180-3825 5.6% 15.0% 85.4%

Lap Free

180-3825

Lap Copy

180-3825 3.9% 30.4% 70.8%

Desk Free

176-3825

Desk Copy

180-3825 13.1% 1.4% 98.1%

Desk Free

165-3576

Desk Copy

180-3825 5.5% 3.0% 96.9%

Average 7.0% 9.4% 87.8%

Increased Sample Size ExperimentsTest Test

Size Train Train Size FRR FAR Performance

Lap Free

180-2000

Lap Copy/ Desk Free

1571-2000 1.1% 33.0% 69.6%

DeskFree

180-2000

Desk Copy/ Lap Free

1620-2000 10.0% 5.1% 94.5%

Lap Free

180-2000

All 4 Data Sets

2000-2000 3.9% 78.4% 27.8%

Desk Free

180-2000

All 4 Data Sets

2000-2000 9.4% 45.6% 57.4%

Average 6.1% 40.5% 62.3%



Lap Free

180-3825

Lap Copy/ Desk Free

1571-4000 0.0% 31.0% 70.4%

DeskFree

180-3825

Desk Copy/ Lap Free

1620-4000 13.9% 2.8% 96.7%

Lap Free

180-3825

All 4 Data Sets

3330-4000 10.6% 77.3% 25.9%

Desk Free

180-3825

All 4 Data Sets

3330-4000 10.6% 50.0% 51.8%

Average 7.4% 40.3% 61.2%



Lap Free

180-3825

Lap Copy/ Desk Free

1571-6000 0.0% 24.5% 76.6%

DeskFree

180-3825

Desk Copy/ Lap Free

1620-6000 12.8% 2.2% 97.3%

Lap Free

180-3825

All 4 Data Sets

3330-6000 6.1% 63.7% 38.9%

Desk Free

180-3825

All 4 Data Sets

3330-6000 15.6% 38.6% 62.4%

Average 8.6% 32.3% 68.8%



Lap Free

180-3825

Lap Copy/ Desk Free

1571-8000 1.7% 26.2% 74.9%

DeskFree

180-3825

Desk Copy/ Lap Free

1620-8000 15.6% 1.7% 97.7%

Lap Free

180-3825

All 4 Data Sets

3330-8000 7.2% 52.8% 49.2%

Desk Free

180-3825

All 4 Data Sets

3330-8000 14.4% 31.8% 98.9%

Average 9.7% 28.2% 80.2%

Linguistic Features Identify Discrepancies

Feature measurements Duration - Calculates the average response time and the

standard deviation Transition – Divided into two types

• Type I - short transition is the time between release and next press

• Type II – long transition is the time between press and the next press

Percentage - Expressed as a ratio of total number of occurrences over total number of KeyStrokes

Discrepancy between 239 and 230 feature measurements Additon of 6 other least frequent consonants feature group include

(q,v,j,x,z,k) Removal of 15 long transition (type 2) feature group include (th, st, nd,

an, in,er,es,on,at,en, or, he, re, ti, ea)

Small training, strong enrollment authentication experiments

BioMetric

TestTest Size

TrainTrain Size

FRR FAR Performance

Keystroke Lap Free

360-15750 Lap Copy 360-

15750 5.28% 14.56% 85.65%

Keystroke Desk Free

341-15059 Desk Copy 360-

15750 17.30% 1.28% 98.36%

AVERAGE 11.3% 8.0% 92.1%

Big training, strong enrollment authentication experiments. 5000, 10000

and 20000 inter-class samplesBioMetri

cTest

Test Size

TrainTrain Size

FRR FAR Performance

Keystroke Lap Free

180-3825 Lap Copy 180-

3825 10.00% 3.29% 96.40%

Keystroke Desk Free

176-3740

Desk Copy

165-3576 21.02% 1.10% 98.01%

Keystroke Lap Free

180-3825 Lap Copy 180-

3825 10.00% 3.29% 96.40%

Keystroke Desk Free

176-3740

Desk Copy

165-3576 21.02% 1.10% 98.01%

Documentation CreationUser Manual

Technical Manual

http://utopia.csis.pace.edu/cs691/2008-2009/team4/

Future WorkExperiment to determine why Laptop input is

less consistent in comparison to Desktop inputPossible reasons

Different keyboard layoutsDifferent body positioning during typingDesktop are more fixed and consistent

Data should be stored in a database as opposed individual files

Older code should be re-factored in order to run more efficiently

Combine last and this semesters work into one project

Conclusion

Thank you for your time and attention

keystroke biometric authentication system

Documents