keystroke biometric authentication system
DESCRIPTION
Keystroke Biometric Authentication System. Spring 2009. Team Members. Spring '09 Team Members Alpha Amatya James Aliperti Thomas Mariutto Ankoor Shah Michael Warren. Spring 2009 Focus. Web Based Authentication continued development of the test-taker authentication application - PowerPoint PPT PresentationTRANSCRIPT
Spring 2009
Team MembersSpring '09 Team Members
Alpha Amatya
James Aliperti
Thomas Mariutto
Ankoor Shah
Michael Warren
Spring 2009 FocusWeb Based Authentication
continued development of the test-taker authentication application
Development of New Conceptsweighted & unweighted top n choices
strong vs. weak enrollmentModify Existing & Write New Programs
to simulate various scoring proceduresRun Experiments
to produce various scenarios
Reasons for Study Keystroke Biometrics is one of the least studied Biometrics
Applications used for user authentication
Most studies use short input; passwords & user names
This study focuses on long text input – Free/Copy
Typing characteristics are said to be:
1) Unique to an individual
2) Difficult to duplicate
Very important for online test taking systems
Important for overall Computer System Security
No special equipment is needed
Contents of SystemA PHP Website registers the user.
A modified Java applet captures 300 keystrokes and produces two files: a raw data file and a text file.
A Java program, BioFeature++, extracts 230 feature measurements.
A Java program, Biometric Authentication System (BAS), performs authentication tests.
4 Quadrant Data Collection36 Subjects 4 Quadrants
5 samples per quadrant
Types of Data Collected Copy Text Free Text
Entry Modes Desktop Laptop
Desktop Laptop
Copy Desktop Copy Laptop Copy
FreeText
Desktop Free Laptop Free
Keystroke Biometric Authentication System (Data Flow)
Raw Data Sample
DeliverablesRecreate authentication experiment from Keystroke Book Chapter Rewrite user and technical manualsModify classifier program to produce top n Within/Between choice and
distancesCreate 1st, 3rd and 5th Nearest Neighbor output tablesCreate output file of top 3 choices from Classifier program and obtain FRR,
FAR and PerformanceCreate ROC curves for each of the 4 quadrant data samplesRun two small-training, strong-enrollment authentication experimentsRun big-training, strong-enrollment authentication experiments,
incrementally increase training sizesWrite detailed descriptions of data formats Investigate discrepancy between 230 and 239 Linguistic-model features
Hierarchical Fallback ModelsTouch-type Model-- based on keys struck by touch typists
254 distinctive measurements
Linguistic Model- -based on language and most frequently used keys
230 distinctive measurements
Increased performance results found utilizing the Linguistic Model
Experimental RecreationCondition
Intra-Inter
Class Sizes
FRR FAR Performance
Train Test
DeskCopy 180-3825 180-3825 11.1% 6.0% 93.8%
LapCopy 180-3825 180-3825 7.8% 4.4% 95.5%
DeskFree 171-3570 176-3740 28.4% 1.4% 97.4%
LapFree 180-3825 180-3825 15.6% 3.7% 95.7%
Average 15.7% 3.7% 95.6%Condition
Intra-Inter
Class Sizes
FRR FAR Performance
Train Test
DeskCopy 180-3825 180-3825 2.78% 2.1% 97.9%
LapCopy 180-3825 180-3825 3.3% 4.0% 96.0%
DeskFree 171-3570 165-3576 21.0% 1.1% 98.0%
LapFree 180-3825 180-3825 10.0% 3.3% 96.4%
Average 9.3% 2.6% 97.1%
Experimental Recreation limiting sample size to 500
Condition
Intra-Inter
Class Sizes
FRR FAR Performance
Train Test
DeskCopy 180-500 180-500 2.78% 4.6% 95.9%
LapCopy 180-500 180-500 2.2% 7.2% 94.1%
DeskFree 176-500 165-500 10.2% 2.2% 95.7%
LapFree 180-500 180-500 6.7% 7.0% 93.1%
Average 5.5% 2.6% 94.7%
Top n=10 W/B Choices And Distance
The implementation compared each sample from the dichotomized test data with every sample from the dichotomized train data.
The shortest Euclidean distance was taken for n=10 choices .
This distance and the choice class , Within (W) or Between (B) was recorded.
This program was run for all four quadrants.Each output contained 180 Within + 3825 Between =
4005 choice tables.
Cross Section of Output File
Overall Accuracy For n=10 Output File using 1st,3rd,& 5st Nearest Neighbors Implemented a program to check overall accuracy on the
outputs created in Deliverable 3.Calculated FRR, FAR and performance for all the
experiments in the 4 quadrants.Precisely matched Deliverable 1 outputs using 1-Nearest
Neighbor, thus proving our experiments are carried out precisely and accurately.
Resulted in a slight improvement using 3 & 5 nearest neighbors as expected.
1st and 3rd Nearest Neighbor ChoiceConditions Within
CorrectWithin Wrong
Within Total
Between Correct
Between Wrong
Between Total
FRR FAR Performance
Desktop Copy 175 5 180 3744 81 3825 2.78% 2.12% 97.85%
Laptop Copy 174 6 180 3671 154 3825 3.33% 4.03% 96.00%
Desktop Free 139 37 176 3699 41 3740 21.02% 1.10% 98.01%
Laptop Free 162 18 180 3699 1236 3825 10.0% 3.29% 96.40%
Conditions Within Correct
Within Wrong
Within Total
Between Correct
Between Wrong
Between Total
FRR FAR Performance
Desktop Copy 173 7 180 3776 49 3825 3.89% 1.28% 98.60%
Laptop Copy 172 8 180 3720 105 3825 4.44% 2.75% 97.18%
Desktop Free 127 49 176 3731 9 3740 27.84% 0.24% 98.52%
Laptop Free 156 24 180 3735 90 3825 13.33% 2.35% 97.15%
1st, 3rd and 5th Nearest NeighborWithin and Between Choices derived using Euclidean
distance
95
95.5
96
96.5
97
97.5
98
98.5
99
99.5
100
DeskCopy LapCopy DeskFree LapFreeCondition
Perc
ent
Acc
urac
y
1-NN
3-NN
5-NN
Receiver Operating Characteristics (ROC curve)
Graphical representation of FAR and FRRFAR- False Acceptance Rate
authenticating an imposter
FRR- False Rejection Rate rejecting a valid user
Top n Nearest Neighbor ResponsesUnweighted
each output choice counted equallyWeighted
first output choice (more valuable) is scored higher
Receiver Operating Characteristics (ROC curve) Implementation: Weighted
Taking n=10 W/B choice output file as input, authenticated a user if 1 or more of the 10 choices is Within(W).
Each match was scored using the formulascore +=(10-j+1)
where score: 0 ->55 , j: 1->10 & choice =WMaximum score = 55Minimum score = 0FRR, FAR for i=0 -> 55 was calculated and ROC plotted.
Receiver Operating Characteristics (ROC curve) Implementation: Unweighted
Taking n=10 W/B choice output file as input, authenticated a user if 1 or more of the 10 choices is Within(W).
Each match was scored using the formulascore +=1
where score: 0 ->10 , j: 1->10 & choice =WMaximum score = 10Minimum score = 0FRR, FAR for i=0 -> 10 was calculated and ROC plotted.
Laptop Copy ROC Curve
01234567891011121314151617181920
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
FRR (%)
FA
R (
%) UNWEIGHTED
WEIGHTED
Desktop Copy ROC Curve
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
FRR (%)
FA
R (
%)
UNWEIGHTED
WEIGHTED
Laptop Free ROC Curve
01234567891011121314151617181920
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
FRR (%)
FA
R (
%) UNWEIGHTED
WEIGHTED
Desktop Free ROC Curve
01234567891011121314151617181920
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
FRR (%)
FA
R (
%)
UNWEIGHTED
WEIGHTED
2 big training, strong enrollment authentication experiments
Train on 36 subjects and test on 18 subjects
Test Test Size Train Train
Size FRR FAR Performance
Lap Free
180-3825
Lap Copy
180-3825 5.6% 15.0% 85.4%
Lap Free
180-3825
Lap Copy
180-3825 3.9% 30.4% 70.8%
Desk Free
176-3825
Desk Copy
180-3825 13.1% 1.4% 98.1%
Desk Free
165-3576
Desk Copy
180-3825 5.5% 3.0% 96.9%
Average 7.0% 9.4% 87.8%
Increased Sample Size ExperimentsTest Test
Size Train Train Size FRR FAR Performance
Lap Free
180-2000
Lap Copy/ Desk Free
1571-2000 1.1% 33.0% 69.6%
DeskFree
180-2000
Desk Copy/ Lap Free
1620-2000 10.0% 5.1% 94.5%
Lap Free
180-2000
All 4 Data Sets
2000-2000 3.9% 78.4% 27.8%
Desk Free
180-2000
All 4 Data Sets
2000-2000 9.4% 45.6% 57.4%
Average 6.1% 40.5% 62.3%
Increased Sample Size ExperimentsTest Test
Size Train Train Size FRR FAR Performance
Lap Free
180-3825
Lap Copy/ Desk Free
1571-4000 0.0% 31.0% 70.4%
DeskFree
180-3825
Desk Copy/ Lap Free
1620-4000 13.9% 2.8% 96.7%
Lap Free
180-3825
All 4 Data Sets
3330-4000 10.6% 77.3% 25.9%
Desk Free
180-3825
All 4 Data Sets
3330-4000 10.6% 50.0% 51.8%
Average 7.4% 40.3% 61.2%
Increased Sample Size ExperimentsTest Test
Size Train Train Size FRR FAR Performance
Lap Free
180-3825
Lap Copy/ Desk Free
1571-6000 0.0% 24.5% 76.6%
DeskFree
180-3825
Desk Copy/ Lap Free
1620-6000 12.8% 2.2% 97.3%
Lap Free
180-3825
All 4 Data Sets
3330-6000 6.1% 63.7% 38.9%
Desk Free
180-3825
All 4 Data Sets
3330-6000 15.6% 38.6% 62.4%
Average 8.6% 32.3% 68.8%
Increased Sample Size ExperimentsTest Test
Size Train Train Size FRR FAR Performance
Lap Free
180-3825
Lap Copy/ Desk Free
1571-8000 1.7% 26.2% 74.9%
DeskFree
180-3825
Desk Copy/ Lap Free
1620-8000 15.6% 1.7% 97.7%
Lap Free
180-3825
All 4 Data Sets
3330-8000 7.2% 52.8% 49.2%
Desk Free
180-3825
All 4 Data Sets
3330-8000 14.4% 31.8% 98.9%
Average 9.7% 28.2% 80.2%
Linguistic Features Identify Discrepancies
Feature measurements Duration - Calculates the average response time and the
standard deviation Transition – Divided into two types
• Type I - short transition is the time between release and next press
• Type II – long transition is the time between press and the next press
Percentage - Expressed as a ratio of total number of occurrences over total number of KeyStrokes
Discrepancy between 239 and 230 feature measurements Additon of 6 other least frequent consonants feature group include
(q,v,j,x,z,k) Removal of 15 long transition (type 2) feature group include (th, st, nd,
an, in,er,es,on,at,en, or, he, re, ti, ea)
Small training, strong enrollment authentication experiments
BioMetric
TestTest Size
TrainTrain Size
FRR FAR Performance
Keystroke Lap Free
360-15750 Lap Copy 360-
15750 5.28% 14.56% 85.65%
Keystroke Desk Free
341-15059 Desk Copy 360-
15750 17.30% 1.28% 98.36%
AVERAGE 11.3% 8.0% 92.1%
Big training, strong enrollment authentication experiments. 5000, 10000
and 20000 inter-class samplesBioMetri
cTest
Test Size
TrainTrain Size
FRR FAR Performance
Keystroke Lap Free
180-3825 Lap Copy 180-
3825 10.00% 3.29% 96.40%
Keystroke Desk Free
176-3740
Desk Copy
165-3576 21.02% 1.10% 98.01%
Keystroke Lap Free
180-3825 Lap Copy 180-
3825 10.00% 3.29% 96.40%
Keystroke Desk Free
176-3740
Desk Copy
165-3576 21.02% 1.10% 98.01%
Documentation CreationUser Manual
Technical Manual
http://utopia.csis.pace.edu/cs691/2008-2009/team4/
Future WorkExperiment to determine why Laptop input is
less consistent in comparison to Desktop inputPossible reasons
Different keyboard layoutsDifferent body positioning during typingDesktop are more fixed and consistent
Data should be stored in a database as opposed individual files
Older code should be re-factored in order to run more efficiently
Combine last and this semesters work into one project
Conclusion
Thank you for your time and attention