classification and prediction by yen-hsien lee department of information management college of...

33
Classification and Prediction by Yen-Hsien Lee Department of Information Manag ement College of Management National Sun Yat-Sen University March 4, 2003

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Classification and Prediction

byYen-Hsien Lee

Department of Information ManagementCollege of Management

National Sun Yat-Sen University

March 4, 2003

Page 2: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Outline

• Introduction to Classification• Decision Tree – ID3• Neural Network – Backpropagation• Bayesian Network

Page 3: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

• Purpose:Classification is the process that establishes classes with attributes from a set of instances in a database. The class of an object must be one from a finite set of possible, pre-determined class values, while attributes of the object are descriptions of the object potentially affecting its class.

• Techniques:ID3 and its descendants, backpropagation neural network, Bayesian Network, CN2, AQ family, etc.

Classification

Page 4: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

ID3 Approach

• ID3 uses an iterative method to build up decision trees, preferring simple trees over complex ones, on the theory that simple trees are more accurate classifiers of future inputs.

• ID3 accomplishes the development of a minimal tree by using an information theoretic approach. By determining the amount of information that can be gained by testing each possible attribute and selecting the one containing the largest amount of information, the decision tree can be optimized.

Page 5: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

No. Attributes ClassOutlook Temperature Humidity Windy

1 Sunny Hot High False N2 Sunny Hot High True N3 Overcast Hot High False P4 Rain Mild High False P5 Rain Cool Normal False P6 Rain Cool Normal True N7 Overcast Cool Normal True P8 Sunny Mild High False N9 Sunny Cool Normal False P10 Rain Mild Normal False P11 Sunny Mild Normal True P12 Overcast Mild High True P13 Overcast Hot Normal False P14 Rain Mild High True N

Sample Training Set

Page 6: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Example: Complex Decision Tree

Temperature

Outlook

WindyP

SunnyOvercast

P

Rain

True

N

False

P

Humidity

Outlook

Windy

SunnyOvercast

P

Rain

True

N

False

P

High Normal

P

Windy

True

P

False

N

Windy

True

N

False

Humidity

High Normal

POutlook

SunnyOvercast

P

Rain

N null

Cool Mild Hot

Page 7: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Example: Simple Decision Tree

Outlook

SunnyOvercast

P

Rain

Windy

True

N

False

P

Humidity

High Normal

PN

Page 8: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Entropy Function

• Entropy of a set C of objects (examples):E C P log Pj 2 j

jwherej output class jPj # of objects in class j total objects in theset Clog2

( ) = -

= = /

0 0

Set C (total objects = n = n1+n2+n3+n4)

Class 1 (n1)

Class 2 (n2)

Class 3 (n3)Class 4 (n4)

E(C) = - (n1/n)*log2(n1/n)

- (n2/n)*log2(n2/n)

- (n3/n)*log2(n3/n)

- (n4/n)*log2(n4/n)

Page 9: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Entropy Function (Cont’d)

• Entropy of a partial tree of C if a particular attribute is chosen for partitioning C:

E A (nk / n)E Ci kk

whereCk disjoint set k which is apartition of C by the

theattributeA i according to A i 's values.E(Ck entropy of theset Ckn total number of objects in theset Cnk total number of objects in thesubset Ck

( ) =

=

( )

)

Page 10: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Entropy Function (Cont’d)

Set C (total objects = n = n1+n2+n3+n4)

Class 1 (n1)

Class 2 (n2)

Class 3 (n3)Class 4 (n4)

E(C) = - (n1/n)*log2(n1/n)

- (n2/n)*log2(n2/n)

- (n3/n)*log2(n3/n)

- (n4/n)*log2(n4/n)

E(C1) = - (m1/m)*log2(m1/m)

- (m2/m)*log2(m2/m)

- (m3/m)*log2(m3/m)

- (m4/m)*log2(m4/m)

Subset C1 (m =m1+m2+m3+m4)

Set C is partitionedinto subsets C1, C2, ...

by attribute Ai

E(C2) = - (p1/p)*log2(p1/p)

- (p2/p)*log2(p2/p)

- (p3/p)*log2(p3/p)

- (p4/p)*log2(p4/p)

Subset C2 (p =p1+p2+p3+p4)

. . .

E(Ai) =

(m/n)*E(C1) +

(p/n)*E(C2) +. . .

Class 1(m1)

Class 2(m2)

Class 3(m3)Class 4

(m4)

Class 1(p1)

Class 2(p2)

Class 3(p3)Class 4

(p4)

Page 11: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Information GainDue to Attribute Partition

Gi = E(C) - E A i( )

Set C (total objects = n = n1+n2+n3+n4)

Class 1 (n1)

Class 2 (n2)

Class 3 (n3)Class 4 (n4)

Entropy of Set C= E(C)

Subset C1 (m =m1+m2+m3+m4)

Class 1(m1)

Class 2(m2)

Class 3(m3)Class 4

(m4)

Set C is partitionedinto subsets C1, C2, ...

by attribute Ai

Subset C2 (p =p1+p2+p3+p4)

. . .Entropy of thepartial tree of C(based on attributeAi) = E(Ai)

Thus, the information gain due to the partition by the attribute Ai isGi = E(C) - E(Ai)

Class 1(p1)

Class 2(p2)

Class 3(p3)Class 4

(p4)

Page 12: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

ID3 Algorithm1 Start from the root node and assign the root node as

the current node C.2 If all objects in the current node C belong to the sam

e class, then stop (the termination condition for the current node C) else go to step 3.

3 Calculate the entropy E(C) for the node C.4 Calculate the entropy E(Ai) of the partial tree partitio

ned by an attribute Ai which has not been used as classifying attributes of the node C.

5 Compute the information gain Gi for the partial tree (i.e., Gi =E(C) - E(Ai)).

Page 13: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

ID3 Algorithm (Cont’d)6 Repeat step 4 and 5 for each attribute which has not

been used as classifying attributes of the node C.7 Select the attribute with the maximum information g

ain (max Gi) as the classifying attribute for the node C.

8 Create child nodes C1, C2, ..., and Cn (assume the selected attribute has n values) for the node C; and assign objects in the node C to appropriate child nodes according to the values of the classifying attribute.

9 Mark the selected attribute as a classifying attribute of each node Ci. For each node C1, assign it as the current node and go to step 2.

Page 14: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Example (See Slide 5)

• Current node C = root node of the tree.• Entropy of the node C = E(C) =

-(9/14)log2(9/14) - (5/14)log2(5/14) = 0.940

Class P:Class P:Objects 3, 4, 5, 7, 9, 10, 11, 12, 13

Class N:Class N:Objects 1, 2, 6, 8, 14

Page 15: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Example (Cont’d)

• Entropy of the partial tree based on the Outlook attribute:E(Outlook=Sunny) =

-(3/5)log2(3/5) - (2/5)log2(2/5) = 0.971E(Outlook=Overcast) =

-(0/4)log2(0/4) - (4/4)log2(4/4) = 0E(Outlook=Rain) =

-(2/5)log2(2/5) - (3/5)log2(3/5) = 0.971E(Outlook) =

(5/14)*E(Outlook=Sunny) +(4/14)*E(Outlook=Overcast) +(5/14)*E(Outlook=Rain) = 0.694

Page 16: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Example (Cont’d)

• Information gain due to the partition by the Outlook attribute:G(Outlook) = E(C) - E(Outlook) = 0.246

• Similarly, the information gains due to the partition by the Temperature, Humidity and Windy attributes, respectively, are:G(Temperature) = 0.029G(Humidity) = 0.151G(Windy) = 0.048

• Thus, the Outlook attribute is selected as the classifying attribute for the current node C since its information gain is the largest among all of the attributes.

Page 17: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Example (Cont’d)

Outlook

SunnyOvercast

P

Rain

• The resulted partial decision tree is:

• The analysis continues for the node C1 and C2 until all of the leaf nodes are associated with objects of the same class.

Objects:1, 2, 8, 9, 11

Objects:4, 5, 6, 10, 14

Objects:3, 7, 12, 13

Page 18: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Example (Cont’d)

Outlook

Sunny Overcast

P

Rain

Windy

True

N

False

P

Humidity

High Normal

PN

• The resulted final decision tree is:

Objects:3, 7, 12, 13

Objects:1, 2, 8

Objects:9, 11

Objects:6, 14

Objects:4, 5, 10

Page 19: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Issues of Decision Tree

• How to deal with continuous attribute.• Pruning tree to make it not case-sensitive.• A better metric than information gains to

evaluate tree expansion. Information gains would prefer to attribute with more attribute-value.

Page 20: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Characteristics of Neural Network (“Connectionist”)

Architecture• Neural network consists of many simple inter

connected processing elements.• The processing elements are often grouped t

ogether into linear arrays called “layers”.• A neural network always has an input layer a

nd an output layer and may have or may not have “hidden” layers.

• Each processing elements has a number of input xi, which carry various wji. The processing element sums the weighted inputs wjixi and computes a single output signal yj that is a function f of that weighted sum.

Page 21: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Characteristics of Neural Network (“Connectionist”)

Architecture (Cont’d)

• The function f, called the transfer function, is fixed for the life of the processing element. A typical transfer function is the sigmod function.

• The function f is the object of a design decision and cannot dynamically be changed. On the other hand, the weights wji are variables and can dynamically be adjusted to produce a given output. This dynamic modification of weights is what allows a neural network to memorize information, to adapt, and to learn.

Page 22: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Neural Network Processing Element

f

x1

x2

xi

yj

wj1

wj2

wj1

. . .

Page 23: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Sigmod Function

y f x 11 e x = ( ) = + -

0

0.2

0.4

0.6

0.8

1

1.2

-15 -10 -5 0 5 10 15

Page 24: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Architecture of Three-Layer Neural Network

. . .

. . .

. . .

Output Layer

Hidden Layer

Input Layer

Page 25: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Backpropagation Network• A fully connected, layered, feedforward and tra

in backward neural network.• Each unit (processing element) in one layer is c

onnected in the forward direction to every unit in the next layer.

• A backpropagation network typically starts out with a random set of weights.

• The network adjusts its weights each time it sees an input-output pair. Each pair requires two stages: a forward pass and backward pass.

• The forward pass involves presenting a sample input to the network and letting activations flow until they reach the output layer.

Page 26: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Backpropagation Network (Cont’d)

• During the backward pass, the network’s actual output (from the forward pass) is compared with the target output and error estimates are computed for the output units. The weights connected to the output units can be adjusted in order to reduce those errors.

• We can then use the error estimates of the output units to derive error estimates for the units in the hidden layers. Finally, errors are propagated back to the connections stemming from the input units.

Page 27: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Issues of Backpropagation Network

• How to present data.• How to decide number of layers.• Learning strategy.

Page 28: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Bayesian Classification

• Bayesian classification is based on Bayes theorem.

• Bayesian classifier predict class membership probabilities, such as the probability that a given sample belongs to a particular class.

• Naïve Bayesian classifiers assume that the effect of an attribute value on a given class is independent of the values of the other attributes.

• Bayesian belief networks are graphical models, which unlike naïve Bayesian classifiers, allow the representation of dependencies among subsets of attributes.

Page 29: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

• Let H be hypothesis, and X be a data sample

• P(H|X) is posterior probability of H given X.• P(X|H) is posterior probability of X given H.• P(H) is prior probability of H.• P(X), P(H), and P(X|H) may be estimated from

the given data.

Bayes Theorem

)(

)()|()|(

XP

HPHXPXHP

Page 30: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

• Assume there being a n attributes, unknown class, data sample X = (x1, x2,…, xn). The process to predict the class (C1, C2, …, Cm) X belongs to in Naïve Bayesian Classifier is as follows:

1. Compute the posterior probability, conditioned on X,

for each class.2. Assign X to the class that has the highest posterior

probability, i.e.

P(Ci|X) > P(Cj|X) for 1 j m, j i

Naïve Bayesian Classification

Page 31: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

• Due to , and P(X) is constant

for all classes, only P(X|Ci)P(Ci) need be maximized.

• Besides, the naïve Bayesian Classifier assume that there are no dependence relationships among the attributes. Thus,

Naïve Bayesian Classification (Cont’d)

)(

)()|()|(

XP

CPCXPXCP ii

i

n

kiki CXPCXP

1

)|()|(

Page 32: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

• To classify data sample X = (Outlook = Sunny, Temperature = Hot, Humidity = Normal, Windy = False), we need to maximize P(X|Ci)P(Ci).– Compute P(Ci)

P(Class = P) = 9/14 = 0.643P(Class = N) = 5/14 = 0.357– Compute P(Xk|Ci)

P(Outlook = Sunny | Class = P) = 2/9 = 0.222P(Outlook = Sunny | Class = N) = 3/5 = 0.600P(Temperature = Hot | Class = P) = 2/9 = 0.222P(Temperature = Hot | Class = N) = 2/5 = 0.400P(Humidity = Normal | Class = P) = 6/9 = 0.667P(Humidity = Normal | Class = N) = 1/5 = 0.200P(Windy = False | Class = P) = 6/9 = 0.667P(Windy = False | Class = N) = 2/5 = 0.400

Example

Page 33: Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

– Compute P(X|Ci)P(X | Class = P) = 0.222 x 0.222 x 0.667 x 0.667 =

0.022P(X | Class = N) = 0.600 x 0.400 x 0.200 x 0.400 =

0.019– Compute P(X|Ci)P(Ci)

P(X | Class = P)P(Class = P) = 0.022 x 0.643 = 0.014

P(X | Class = N)P(Class = N) = 0.019 x 0.357 = 0.007

• Conclude: X belongs to Class P

Example (Cont’d)