chapter 8 discriminant analysis. 8.1 introduction classification is an important issue in...
TRANSCRIPT
Chapter 8
Discriminant Analysis
8.1 Introduction
Classification is an important issue in multivariate analysis and data mining.
Classification: classifies data (constructs a model) based on the
training set and the values (class labels) in a classifying attribute and uses it in classifying new data, i.e., predicts unknown or missing values
Classification—A Two-Step Process
Model construction: describing a set of predetermined classes Each tuple/sample is assumed to belong to a predefined class, as determined b
y the class label attribute The set of tuples used for model construction is training set The model is represented as classification rules, decision trees, or mathematic
al formulae Prediction: for classifying future or unknown objects
Estimate accuracy of the model The known label of test sample is compared with the classified result fro
m the model Accuracy rate is the percentage of test set samples that are correctly classi
fied by the model Test set is independent of training set, otherwise over-fitting will occur
If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known
Classification Process : Model Construction
TrainingData
NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no
ClassificationAlgorithms
IF rank = ‘professor’OR years > 6THEN tenured = ‘yes’
Classifier
(Model)
Classification Process: Use the Model in Prediction
Classifier
TestingData
NAME RANK YEARS TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes
Unseen Data
(Jeff, Professor, 4)
Tenured?
Supervised vs. Unsupervised Learning
Supervised learning (classification)
Supervision: The training data (observations, measurements, etc.) are
accompanied by labels indicating the class of the observations
New data is classified based on the training set
Unsupervised learning (clustering)
The class labels of training data is unknown
Given a set of measurements, observations, etc. with the aim of establi
shing the existence of classes or clusters in the data
Discrimination— Introduction
Discrimination is a technique concerned with allocating new observations to previously defined groups.
There are k samples from k distinct populations:
One wants to find the so-called discriminant function and related rule to identify the new observations.
: :
1
111
111
11
111
1
11
kpn
kn
kp
k
k
pnn
p
kkxx
xx
G
xx
xx
G
Example 11.3 Bivariate case
Discriminant function and rule
1
2
Discriminant function:
ifRule
if
w l'
G w
G w
x x
x x a
x x a
Example 11.1: Riding mowersExample 11.1: Riding mowers
Consider two groups in city: riding-mower owners Consider two groups in city: riding-mower owners
and those without riding mowers. In order to identify and those without riding mowers. In order to identify
the best sales prospects for an intensive sales the best sales prospects for an intensive sales
campaign, a riding-mower manufacturer is interested campaign, a riding-mower manufacturer is interested
in classifying families as prospective owners or non-in classifying families as prospective owners or non-
owners on the basis of income and lot size.owners on the basis of income and lot size.
Example 11.1: Riding mowersExample 11.1: Riding mowers
x1:
(Income in $1000s)
x2:
(Lot size 1000 ft2)
x1:
(Income in $1000s)
x2:
(Lot size 1000 ft2)60 18.4 75 19.6
85.5 16.8 52.8 20.864.8 21.6 64.8 17.261.5 20.8 43.2 20.487 23.6 84 17.6
110.1 19.2 49.2 17.6108 17.6 59.4 1682.8 22.4 66 18.469 20 47.4 16.493 20.8 33 18.851 22 51 1481 20 63 14.8
π1: Riding-mower owners π2: Nonowners
Example 11.1: Riding mowersExample 11.1: Riding mowers
G1 G2
G1 10 2G2 2 10
True
Classify as
8.2 Discriminant by Distance
Assume k=2 for simplicity
0if
0if :Rule
:functionnt Discrimina
2
1
22
12
xx
xx
xxx
wG
wG
,Gd,Gdw
22
211
1 Σ,μ :,Σ,μ : pp NGNG
Consider the Mahalanobis distance
.,,μxΣμxx 21 12 j',Gd jj
jj
1 2
1 1 2 21 1
1 2 1 21
when
12
2
- -
-
w ' '
'
Σ Σ Σ
x x μ Σ x μ x μ Σ x μ
x μ μ Σ μ -μ
8.2 Discriminant by Distance
211
21
21
μμΣc
μμμ
-
Let
μxc
μμΣμxx
x
'
'w
w- 211
becan function nt discrimina The
8.2 Discriminant by Distance
'
nn
n
ji
n
i
jij
n
ii
j
j
jj
j
jj
Where
21
1
are estimators their known, are When
1
2121
1
21
xxxxA
AAΣ~
xx
Σ,μ,μ
8.2 Discriminant by Distance
Example Univariate Case with equal variance
212
1
21
if
if:Rule μμa
aG
aG
xx
xx
a1 2
2222
2111 :: ,,, NGNG
a*
2222
2111 :: ,,, NGNG
21
2112
*a
Example Univariate Case with equal variance
8.3 Fisher’s Discriminant Function
Idea: projection, ANOVA
Training samples
kn
kkpk
knp
kNG
NG
xxΣ μ
xxΣ μ
,,,,:
,,,,:
1
1111 1
8.3 Fisher’s Discriminant Function
Projection the data on a direction , the F-statisticspRl
,Ell
Blll kn'
k'F
1
where
k
a
aj
n
ja
aj
a
k
aaa
'E
'nB
a
1 1
1
xxxx
xxxx
8.3 Fisher’s Discriminant Function
To find such that pR*l
lll* FF
pRmax
The solution of is the eigenvector associated with the largest eigenvalue of .
*l
Discriminant function: ll x,lx where'u
0 EB
8.3 Fisher’s Discriminant Function
(B) Two Populations(B) Two Populations
'n'n xxxxxxxxB 222
111
Note
21
22
11
nnnn xx
x
We have and21 AAE
'nn
nn 2121
21
21 xxxxB
There is only one non-zero eigenvalue of as 0 EB .B 1rank
The associated eigenvector is .xxE 211
1 21
11 2
2
Discriminant function:
ifRule: when
if
' '
G
G
u x x E x x c x
x u xΣ Σ
x u x
where 1 21
2' c x x
(B) Two Populations(B) Two Populations
When is replaced by 1 ,2ΣΣ
21
21
12
ˆˆ
xcˆxcˆ ''
where
211212
121
21
2
22
22
211211
121
21
1
11
21
11
11
11
11
xxAAAAAxx
cAcˆ
xxAAAAAxx
cAcˆ
'n
'n
'n
'n
(B) Two Populations(B) Two Populations
Example Inset Classification
No. x1 x2 n. g. c. g. y
1 6.36 5.24 1 1 2.47132 5.92 5.12 1 2 2.33353 5.92 5.36 1 1 2.36634 6.44 5.64 1 1 2.54815 6.40 5.16 1 1 2.47146 6.56 5.56 1 1 2.57027 6.64 5.36 1 1 2.56508 6.68 4.96 1 1 2.52139 6.72 5.48 1 1 2.603410 6.76 5.60 1 1 2.630911 6.72 5.08 1 1 2.5488
Table 2.1 Data of two species of insects
No. x1 x2 n. g. c. g. y
1 6.00 4.88 2 2 2.32272 5.60 4.64 2 2 2.17963 5.65 4.96 2 2 2.23434 5.76 4.80 2 2 2.24565 5.96 5.08 2 2 2.33916 5.72 5.04 2 2 2.26747 5.64 4.96 2 2 2.23438 5.44 4.88 2 2 2.16829 5.04 4.44 2 2 1.997710 4.56 4.04 2 2 1.810611 5.48 4.20 2 2 2.086312 5.76 4.80 2 2 2.2456
Table 2.1 Data of two species of insects
Note: data x1 and x2 are the characteristics of insect (Hoel,1947)
n.g. means natural group (species),
c.g. the classified group,
y the value of the discriminant function
1 26.4654 5.5500 5 9878, ,
5.3236 4.7267 5 0122
2 6765 1 2942 4.8097 3.1364,
1.2942 1.7545 3.1364 2.0453
.
.
. .
x x x
E B
The eigenvalue of is 1.9187 and the associated eigenvector is
0 EB
..
.xxE
13670
27590211
Example Inset Classification
The discriminant function is
and the associated value of each observation is given in the table. The cutting point is
2121 1367027590 xxxxu ..,
..34472
Classification is G1 G2
G1 10 1G2 0 12
classify as
True
If we use , we have the same classification.
1 2ˆ ˆ2.3831 0.0939, 0.1497
Example Inset Classification
8.4 Bayes’ Discriminant Analysis
A. Idea
There are k populations G1, …, Gk in Rp.
A partition of Rp, R1, …, Rk , is determined based on a trainingsample.
Rule: if falls into Ri
Loss: is from Gi , but falls into Rj
The Probability of this misclassification
where is the density of .
iGx x
:ijc | x x
, xx| dpijPjR i
xip iGx
Expected cost of misclassification is
where q1, …, qk are prior probabilities.
We want to minimize ECM(R1, …, Rk ) w.r.t. R1, …, Rk .
11 1
ECM , , | |k k
k ii j
R R q c j i p j i
8.4 Bayes’ Discriminant Analysis
Theorem 6.4.1
Let
Then the optimal Rt’s are
1
|k
t i iii t
h x q p c t i
x
.,,,,xxx kttjhhR jtt 1:
B. Method
Take if and 0 if .
Then
| 1ijc j i ji ji
.,,,,xxx kttjpqpqR jjttt 1:
Proof:
1
k
t i i t ti
t t
h x q p q p
c q p
x x
x x
Corollary 1
In the case of k=2
12
21
112
221
cpqxh
cpqxh
x
x
we have
1 2 2 1 1
2 2 2 1 1
: 1| 2 2 |1
: 2 |1 1| 2
R q p c q p c
R q p c q p c
x x x
x x x
Corollary 2
1 2
1
2
2
1
Discriminant function:
ifRule:
if
1| 2where
2 |1
u p p
G u d
G u d
q cd
q c
x x x
x x
x x
In the case of k=2 and
22
11
if
if
GN
GN
p
p
xΣ,μ
xΣ,μ ~x
Corollary 3
ln if
ln if :Rule
2
1
dwG
dwG
xx
xx
Then
21121
2
1
21
where
exp
μμ Σμμ x x
xxx
x
-'w
wpp
u
C. Example 11.3:C. Example 11.3:Detection of hemophilia A carriersDetection of hemophilia A carriers
For the detection of hemophilia A carriers, to construct a For the detection of hemophilia A carriers, to construct a procedure for detecting potential hemophilia A carriers, procedure for detecting potential hemophilia A carriers, blood samples were assayed for two groups of women blood samples were assayed for two groups of women and measurements on the two variables. The first group and measurements on the two variables. The first group of 30 women were selected from a population of women of 30 women were selected from a population of women who did not carry the hemophilia gene. This group was who did not carry the hemophilia gene. This group was called the normal group. The second group of 22 women called the normal group. The second group of 22 women was selected from known hemophilia A carriers. This was selected from known hemophilia A carriers. This group was called the obligatory carriers.group was called the obligatory carriers.
Variables:Variables: loglog1010 (AHF activity) (AHF activity)
loglog1010 (AHF-like antigen) (AHF-like antigen)
Populations:Populations: population of women who did not carrypopulation of women who did not carry
the hemophilia gene (nthe hemophilia gene (n11=30)=30)
population of women who are knownpopulation of women who are known
hemophilia A carriers (nhemophilia A carriers (n22=45)=45)
C. Example 11.3C. Example 11.3::Detection of hemophilia a carriersDetection of hemophilia a carriers
C. Example 11.3:C. Example 11.3:Detection of hemophilia a carriersDetection of hemophilia a carriers
Data setData set
-0.0056 -0.1698 -0.3469 -0.0894 -0.1679 -0.0836 -0.1979 -0.0762 -0.1913 -0.1092 -0.0056 -0.1698 -0.3469 -0.0894 -0.1679 -0.0836 -0.1979 -0.0762 -0.1913 -0.1092 -0.5268 -0.0842 -0.0225 0.0084 -0.1827 0.1237 -0.4702 -0.1519 0.0006 -0.2015 -0.5268 -0.0842 -0.0225 0.0084 -0.1827 0.1237 -0.4702 -0.1519 0.0006 -0.2015 -0.1932 0.1507 -0.1259 -0.1551 -0.1952 0.0291 -0.228 -0.0997 -0.1972 -0.0867-0.1932 0.1507 -0.1259 -0.1551 -0.1952 0.0291 -0.228 -0.0997 -0.1972 -0.0867
-0.1657 -0.1585 -0.1879 0.0064 0.0713 0.0106 -0.0005 0.0392 -0.2123 -0.119 --0.1657 -0.1585 -0.1879 0.0064 0.0713 0.0106 -0.0005 0.0392 -0.2123 -0.119 -0.4773 0.4773 0.0248 -0.058 0.0782 -0.1138 0.214 -0.3099 -0.0686 -0.1153 -0.0498 -0.2293 0.0933 0.0248 -0.058 0.0782 -0.1138 0.214 -0.3099 -0.0686 -0.1153 -0.0498 -0.2293 0.0933 -0.0669 -0.1232 -0.1007 0.0442 -0.171 -0.0733 -0.0607 -0.056-0.0669 -0.1232 -0.1007 0.0442 -0.171 -0.0733 -0.0607 -0.056
-0.3478 -0.3618 -0.4986 -0.5015 -0.1326 -0.6911 -0.3608 -0.4535 -0.3479 -0.3539 -0.3478 -0.3618 -0.4986 -0.5015 -0.1326 -0.6911 -0.3608 -0.4535 -0.3479 -0.3539 -0.4719 -0.361 -0.3226 -0.4319 -0.2734 -0.5573 -0.3755 -0.495 -0.5107 -0.1652 -0.4719 -0.361 -0.3226 -0.4319 -0.2734 -0.5573 -0.3755 -0.495 -0.5107 -0.1652 -0.2447 -0.4232 -0.2375 -0.2205 -0.2154 -0.3447 -0.254 -0.3778 -0.4046 -0.0639 -0.2447 -0.4232 -0.2375 -0.2205 -0.2154 -0.3447 -0.254 -0.3778 -0.4046 -0.0639 -0.3351 -0.0149 -0.0312 -0.174 -0.1416 -0.1508 -0.0964 -0.2642 -0.0234 -0.3352 -0.3351 -0.0149 -0.0312 -0.174 -0.1416 -0.1508 -0.0964 -0.2642 -0.0234 -0.3352 -0.1878 -0.1744 -0.4055 -0.2444 -0.4784-0.1878 -0.1744 -0.4055 -0.2444 -0.4784 0.1151 -0.2008 -0.086 -0.2984 0.0097 -0.339 0.1237 -0.1682 -0.1721 0.0722 0.1151 -0.2008 -0.086 -0.2984 0.0097 -0.339 0.1237 -0.1682 -0.1721 0.0722 -0.1079 -0.0399 0.167 -0.0687 -0.002 0.0548 -0.1865 -0.0153 -0.2483 0.2132 -0.1079 -0.0399 0.167 -0.0687 -0.002 0.0548 -0.1865 -0.0153 -0.2483 0.2132 -0.0407 -0.0998 0.2876 0.0046 -0.0219 0.0097 -0.0573 -0.2682 -0.1162 0.1569 -0.0407 -0.0998 0.2876 0.0046 -0.0219 0.0097 -0.0573 -0.2682 -0.1162 0.1569 -0.1368 0.1539 0.14 -0.0776 0.1642 0.1137 0.0531 0.0867 0.0804 0.0875 0.251 -0.1368 0.1539 0.14 -0.0776 0.1642 0.1137 0.0531 0.0867 0.0804 0.0875 0.251 0.1892 -0.2418 0.1614 0.02820.1892 -0.2418 0.1614 0.0282
normalnormal
log10(AHF activity)log10(AHF activity)
log10(AHF-like antigen)log10(AHF-like antigen)
ObligatoryObligatorycarriercarrier
log10(AHF activity)log10(AHF activity)
log10(AHF-like antigen)log10(AHF-like antigen)
C. Example 11.3:C. Example 11.3:Detection of hemophilia a carriersDetection of hemophilia a carriers
SAS outputSAS output
C. Example 11.3:C. Example 11.3:Detection of hemophilia a carriersDetection of hemophilia a carriers
C. Example 11.3C. Example 11.3::Detection of hemophilia a carriersDetection of hemophilia a carriers
C. Example 11.3C. Example 11.3::Detection of hemophilia a carriersDetection of hemophilia a carriers
C. Example 11.3C. Example 11.3::Detection of hemophilia a carriersDetection of hemophilia a carriers