introduction to pattern recognition (...
TRANSCRIPT
INTRODUCTION TO PATTERN RECOGNITION(การร� �จํ�าร�ปแบบเบ� องต้�น)
2
Week 2-3
Pattern Recognition Systems The Design Cycle Learning and Adaptation Classifier Based on Bayes Decision
Theory
3
Pattern Recognition Systems
4
Pattern Recognition Systems Sensing
Use of a transducer (camera or microphone)
PR system depends of the bandwidth, the resolution sensitivity distortion of the transducer
Segmentation and grouping Patterns should be well separated and
should not overlap
5
Pattern Recognition Systems Feature extraction
Discriminative features Invariant features with respect to translation,
rotation and scale.
Classification Use a feature vector provided by a feature
extractor to assign the object to a category
Post Processing Exploit context input dependent information other
than from the target pattern itself to improve performance
6
The Design Cycle
Data collection Feature Choice Model Choice Training Evaluation Computational Complexity
7
The Design Cycle
Data Collection How do we know when we have
collected an adequately large and representative set of examples for training and testing the system?
Feature Choice Depends on the characteristics of the
problem domain. Simple to extract, invariant to irrelevant transformation insensitive to noise.
8
The Design Cycle
Model Choice Unsatisfied with the performance of
our fish classifier and want to jump to another class of model
Training Use data to determine the classifier.
Many different procedures for training classifiers and choosing models
9
The Design Cycle
Evaluation Measure the error rate (or performance and
switch from one set of features to another one
Computational Complexity What is the trade-off between computational
ease and performance?(How an algorithm scales as a function of the number of features, patterns or categories?)
10
Learning and Adaptation
Supervised learningA teacher provides a category label or cost for each pattern in the training set- Classification
Unsupervised learningThe system forms clusters or “natural groupings” of the input patterns - Clustering
11
Classifier Based on Bayes Decision
Theory
12
Classifier Based on Bayes Decision Theory Bay’s Decision theory The Gaussian Probability Density
Function Minimum Distance Classifier
Euclidean Mahalanobis
13
Bay’s Decision theory
14
Bay’s Decision theory
our example on classifying two fish as salmon or sea bass. And our agreement that any given fish is either a salmon or a sea bass; call state of nature of the fish.
Let’s define a (probabilistic) variable that describes the state of nature.
= for sea bass (1) = for salmon (2)
Let’s assume this two class case.
15
Bay’s Decision theory
The a priori or prior probability reflects our knowledge of how likely we expect a certain state of nature before we can actually observe said state of nature.
In the fish example, it is the probability that we will see either a salmon or a sea bass next on the conveyor belt.
16
Bay’s Decision theory
Note: The prior may vary depending on the situation. If we get equal numbers of salmon
and sea bass in a catch, then the priors are equal, or uniform.
Depending on the season, we may get more salmon than sea bass, for example.
17
Bay’s Decision theory
We write = or just for the prior the next is a sea bass.
The priors must exhibit exclusivity and exhaustively. For states of nature, or classes:
18
Bay’s Decision theory
A feature is an observable variable. A feature space is a set from which we
can sample or observe values. Examples of features:
Length, Width, Lightness, Location of Dorsal Fin
For simplicity, let’s assume that our features are all continuous values.
Denote a scalar feature as x and a vector feature as . For a -dimensional feature space,
19
Bay’s Decision theory
In a classification task, we are given a pattern and task is to classify it into one out of classes.
The number of classes, , is assumed to be known a priori.
Each pattern is represented by a set of feature values, which make up the l-dimensional feature vector
We assume that each pattern is represented uniquely by a single feature vector and that it can belong to only one class.
20
Bays Decision Theory
Also, we let the number of possible classes be equal to , that is , …
According to the Bayes decision theory, is assigned to the class if
21 The Gaussian Probability
Density Function
22
The Gaussian Probability Density Function The Gaussian pdf is extensively used in
pattern recognition because of its mathematical tractability as well as because of the central limit theorem.
23
The Gaussian Probability Density Function The multidimensional Gaussian pdf has
form
Where is the mean vector is the covariance matrix is determinant of is invert of is number of dimension
24
The Gaussian Probability Density Function Example 1: Compute the value of a
Gaussian pdf, , at where ,
Answers
25
The Gaussian Probability Density Function Example 2: Consider a 2-class
classification task in the 2-dimensional space, where the data in both classes, , are distributed according to the Gaussian distributions and respectively.
Let , , Assuming that classify into or
Answers
classify into
26
Mean Vector and
Covariance Matrix
27
Mean Vector and Covariance Matrix The first step in analyzing multivariate
data is computing the mean vector and the variance-covariance matrix.
Consider the following matrix: Sample data matrix
Sample data matrix
28
Mean Vector and Covariance Matrix Each row vector is another observation
of the three variables (or components). The mean vector consists of
the means of each variable the variance-covariance matrix consists
of the variances of the variables along the main diagonal and the covariances between each pair of variables in the other matrix positions.
29
Mean Vector and Covariance Matrix The formula for computing the covariance
of the variables and is
with and denoting the means of and , respectively.
for this example.
30
Mean Vector and Covariance Matrix The results are: mean vector
variance-covariance matrix
31
Mean Vector and Covariance MatrixExample mean : covariance matrix:
Generate random number following the Gaussian distribution
32
Mean Vector and Covariance Matrix
S=[1 00 1] S=[0.2 0
0 0.2] S=[2 00 2]
33
Mean Vector and Covariance Matrix
S=[0.2 00 2 ] S=[2 0
0 0.2] S=[ 1 0.50.5 1 ]
34
Mean Vector and Covariance Matrix
S=[0.3 0.50.5 2 ] S=[ 0.3 −0.5
−0.5 2 ]
35 Minimum Distance
Classifiers• Euclidean• Mahalanobis
36
Minimum Distance Classifiers
Template matching can be expressed mathematically through a notion of distance.
Let be the feature vector for the unknown input, and let be mean for the c classes.
The error in matching against is given by distance between and .
Choose the class for which the error is a minimum.
This technique is called minimum distance classification.
37
Minimum Distance Classifiers
m1
m2
m3
x
Distancem1
Distancem2
Distancemc
•••
Min
imum
Sele
ctor
Cla
ss
•••
x
38
The Euclidean Distance Classifier The minimum Euclidean distance classifier.
That is, given an unknown , assign it to class if
Where is the mean of class is the mean of class
39
The Euclidean Distance Classifier
It must be stated that the Euclidean classifier is often used, because of its simplicity.
It assigns a pattern to the class whose mean is closest to it with respect to the Euclidean norm.
40
The Euclidean Distance Classifier
Example: Consider a 2-class classification task in the 3-dimensional space, where the two classes, and are modeled by Gaussian distributions with means and , respectively. , Given the point classify according to the Euclidean
distance classifier. Answers
The point is classified to the class.
41
The Mahalanobis Distance Classifier
The minimum Mahalanobis distance classifier.
That is, given an unknown x, it is assigned to class if
,
where is the common covariance matrix. The presence of the covariance matrix accounts for the shape of the Gaussians.
and is invert of
42
The Mahalanobis Distance Classifier
Example: Consider a 2-class classification task in the 3-dimensional space, where the two classes, and are modeled by Gaussian distributions with means and ,respectively. ,
Given the point The covariance matrix for distribution is Given the point , classifity according to
the Mahalanobis distance classifier.Answers
The point is classified to the
class.