thesis title: “studies in pattern classification – biological modeling, uncertainty reasoning,...
TRANSCRIPT
• Thesis title: “Studies in Pattern Classification – Biological
Modeling, Uncertainty Reasoning, and Statistical Learning”
• 3 parts:(1) Handwritten Digit Recognition with a Vision-
Based Model (part in CVPR-2000)(2) An Uncertainty Framework for Classification
(UAI-2000)(3) Selection of Support Vector Kernel Parameters
(ICML-2000)
Handwritten Digit Recognitionwith a Vision-Based Model
Loo-Nin Teow & Kia-Fock Loe
School of Computing
National University of Singapore
OBJECTIVE
• To develop a vision-based system that extracts features for handwritten digit recognition based on the following principles:– Biological Basis;– Linear Separability;– Clear Semantics.
Developing the model
2 main modules:
• Feature extractor – generates feature vector from raw pixel map.
• Trainable classifier– outputs the class based on the feature vector.
General System Structure
Handwritten Digit Recognizer
Feature
Extractor
Feature
Classifier
Raw
Pixel
Map
Feature
Vector
Digit
Class
The Biological Visual System
Primary Visual Cortex
Eye
Opticnerve
Optictract
Opticchiasm
Brain
Lateralgeniculate
nucleus
Opticradiation
Biological Vision
• Local spatial features;
• Edge and corner orientations;
• Dual-channel (bright/dark; on/off);
• Non-hierarchical feature extraction.
The Feature Extraction Process Selective Feature Convolution Aggregation
I
Q
F
I
I
Q
F
2 of 36x36 32 of 32x32 32 of 9x9
Dual Channel
• On-Channel intensity-normalize (Image)
• Off-Channel complement (On-Channel)
ueMaxGrayVal
YXYX
),(),(
II
),(1),( YXYX II
Selective Convolution
• Local receptive fields– same spatial features at different locations.
• Truncated linear halfwave rectification– strength of feature’s presence.
• “Soft” selection based on central pixel– reduce false edges and corners.
Selective Convolution (formulae)
where
),(),(),( YXYXYX jj GIQ
r
rM
r
rNjj NYMXNMYX ),(),(),( IHG
otherwise0
0 if)(
zzz
Convolution Mask Templates
• Simplified models of the simple and hypercomplex receptive fields.
• Detect edges and end-stops of various orientations.
• Corners - more robust than edges– On-channel end-stops : convex corners;– Off-channel end-stops : concave corners.
Some representatives ofthe 16 mask templates used
in the feature extraction
-1 1 -2 1-1 1 -1 1-1 1 -2 1-1 1 1-1 1 1
-1 -1 -1 -1 -8 -8 -1-8 -8 2 -1
2 2 2 2 -8 -1 2 -2-8 -1 2
-1 -1 -1 -1 -2 2
Feature Aggregation
• Similar to subsampling:– reduces number of features;– reduces dependency on features’ positions;– local invariance to distortions and translations.
• Different from subsampling:– magnitude-weighted averaging;– detects presence of feature in window; – large window overlap.
Feature Aggregation (formulae)
Magnitude-Weighted Average
where
1
0
1
0
),(),(),(v
M
v
NjjXYj NuYMuXNMYX QWF
1
0
1
0
),(
),(),( v
S
v
Tj
j
jXY
TuYSuX
NuYMuXNM
Q
QW
Classification
• Linear discrimination systems– Single-layer Perceptron Network
minimize cross-entropy cost function.– Linear Support Vector Machines
maximize interclass margin width.
• k-nearest neighbor– Euclidean distance– Cosine Similarity
xxx
x
x
x
oo
o
o
oo
xxx
x
x
x
o
o
o
o
oo
Multiclass Classification Schemesfor linear discrimination systems
• One-per-class (1 vs 9)
• Pairwise (1 vs 1)
• Triowise (1 vs 2)
Experiments
• MNIST database of handwritten digits.
• 60000 training, 10000 testing.
• 36x36 input image.
• 32 9x9 feature maps.
Preliminary ExperimentsFeature
ClassifierScheme Voting
OptionTrain Error (%)(60000 samples)
Test Error (%)(10000 samples)
PerceptronNetwork
1-per-class - 0.00 2.14
Pairwise Hard 0.00 0.88
Soft 0.00 0.87
Triowise Hard 0.00 0.72
Soft 0.00 0.72
LinearSVMs
Pairwise Hard 0.00 0.98
Soft 0.00 0.82
Triowise Hard 0.00 0.74
Soft 0.00 0.72
k-NearestNeighbor
EuclideanDistance
- 0.00 1.39(k = 3)
CosineSimilarity
- 0.00 1.09(k = 3)
Experiments on Deslanted Images
FeatureClassifier
Scheme VotingOption
Train Error (%)(60000 samples)
Test Error (%)(10000 samples)
PerceptronNetwork
Pairwise Hard 0.00 0.81
Soft 0.00 0.73
Triowise Hard 0.00 0.63
Soft 0.00 0.62
LinearSVMs
Pairwise Hard 0.00 0.69
Soft 0.00 0.68
Triowise Hard 0.00 0.65
Soft 0.00 * 0.59 *
Comparison with Other Models
Classifier Model Test Error (%)
LeNet-4 1.10
LeNet-4, boosted [distort] 0.70
LeNet-5 0.95
LeNet-5 [distort] 0.80
Tangent distance 1.10
Virtual SVM 0.80
< Our model > [deslant] * 0.59 *
Conclusion
• Our model extracts features that are– biologically plausible;– linearly separable;– semantically clear.
• Needs only a linear classifier– relatively simple structure;– trains fast;– gives excellent classification performance.
Hierarchy of Features?
• Idea originated from Hubel & Wiesel– LGN simple complex hypercomplex– later studies show these to be parallel.
• Hierarchy - too many feature combinations.
• Simpler to have only one convolution layer.
Linear Discrimination
Output:
where f defines a hyperplane:
and g is the activation function:
or
)(xfgp
)exp(1
1)(
zzg
bf xwx)(
1 if1
11 if
1 if1
)(
z
zz
z
zg
One-per-class Classification
• the unit with the largest output value indicates the class of the character:
A
ApA argmax*
Pairwise Classification
Soft Voting:
Hard Voting:
where
AB
BAAB
AppA argmax*
z
zz
if1
if1)(
AB
BAAB
AppA )()(argmax*
Triowise Classification
Soft Voting:
Hard Voting:
CBACAB
CABBACABC
ApppA )()()(argmax*
CBACAB
CABBACABC
ApppA argmax*
k-Nearest Neighbor
Euclidean Distance
Cosine Similarity
where
traintest xx
traintest
traintest
xx
xx
zzz
Confusion Matrix (triowise SVMs / soft voting / deslanted)
Class 0 1 2 3 4 5 6 7 8 9 # errors
0 977 0 0 0 0 0 2 1 0 0 3
1 0 1134 1 0 0 0 0 0 0 0 1
2 1 0 1023 1 1 0 0 5 1 0 9
3 0 0 1 1005 0 4 0 0 0 0 5
4 0 0 0 0 975 0 1 1 1 4 7
5 1 0 0 3 0 887 1 0 0 0 5
6 5 2 1 0 1 1 948 0 0 0 10
7 0 1 2 1 0 0 0 1022 0 2 6
8 0 0 1 0 0 1 0 0 972 0 2
9 0 0 0 0 7 2 0 1 1 998 11