1 acctg 6910 building enterprise & business intelligence systems (e.bis) classification olivia...

47
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business

Post on 19-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

1

ACCTG 6910Building Enterprise &

Business Intelligence Systems(e.bis)

ACCTG 6910Building Enterprise &

Business Intelligence Systems(e.bis)

Classification

Olivia R. Liu Sheng, Ph.D.Emma Eccles Jones Presidential Chair of Business

Olivia R. Liu Sheng, Ph.D.Emma Eccles Jones Presidential Chair of Business

Page 2: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

2

Outline

• Introduction • Classic Methods

– Decision Tree– Neural Network

Page 3: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

3

Introduction • Classification

– Classifies objects into a set of pre-specified object classes based on the values of relevant object attributes and objects’ class lables

ClassifierO1

O3O2O5O4

O6

O1 O2 O6

O5

O3 O4

Oi:contains relevant attribute values and class labels

Class X

Class Y

Class Z

Classes X, Y and Z are pre-determined

Page 4: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

4

Introduction

• When to use it?– Discovery (descriptive, explanatory)– Prediction (prescriptive, decision support)– When the relevant object data can be

decided and is available

• Real World Applications– Profiling/predicting customer purchases– Loan/credit approval – Fraud/intrusion detection– Diagnosis decision support

Page 5: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

5

Example

Age Income Churn?70 20,000 Yes60 18,000 Yes75 36,000 Yes67 33,000 Yes60 36,000 Yes60 50,000 No50 12000 Yes40 12000 Yes30 12000 No50 30,000 No40 16000 Yes35 20,000 Yes48 36,000 No30 37,000 No22 50,000 No21 51,000 No

Income

Age

:Churn

:Not Churn

Page 6: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

6

Notations

Age Income Churn?70 20,000 Yes60 18,000 Yes75 36,000 Yes67 33,000 Yes60 36,000 Yes60 50,000 No50 12000 Yes40 12000 Yes30 12000 No50 30,000 No40 16000 Yes35 20,000 Yes48 36,000 No30 37,000 No22 50,000 No21 51,000 No

Income

Age

:Churn

:Not Churn

Classification Attributes Class Label Attribute

Class Labels

Problem Space

Classification SamplesPrediction Object

Page 7: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

7

Object Data Required• Class Label Attribute:

– Dependent variable, output attribute, prediction variable,…

– Variable whose values label objects’ classes

Classification Attributes:– Independent variables, input attributes, or predictor

variables– Object variables whose values affect objects’ class

labels

• Three Types:– numerical (age, income)– categorical (hair color, sex)– ordinal (severity of a injury)

Page 8: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

8

Classification Vs. Prediction

– View 1• Classification: discovery• Prediction: predictive utilizing classification results

(rules)

– View 2• Either discovery or predictive• Classification: categorical or ordinal class labels• Prediction: numerical (continuous) class labels

– Class lectures, assignment and exam: • View 1

– Text: View 2

Page 9: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

9

Classification & Prediction

• Main Function– Mappings from input attribute values to

output attribute values– Methods affect how the mappings are

derived and represented

• Process– Training (supervised): derives the mappings– Testing: evaluate accuracy of the mappings

Page 10: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

10

Classification & Prediction

• Classification samples: divided into training and testing sets– Often processed in batched modes– Include class labels

• Prediction objects– Often processed in online modes– No class labels

Page 11: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

11

Classification Methods• Comparative Criteria

– Accuracy– Speed– Robustness– Scalability– Interpretability– Data types

• Classic methods– Decision Tree– Neural Network– Bayesian Network

Page 12: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

12

Decision Tree

• Mapping Principle: Recursively partition the data set so that the subsets contain “pure” data

Income

Age

Page 13: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

13

Decision Tree• Algorithm:

Start from the whole data set;Do{

Split the data set into two or more subsets by every possible class label;Choose the split that produce the “purist” subsets;

}While (subset not pure)

Page 14: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

14

Decision Tree• Key Question: How is purity (diversity)

measured?

– Gini Index of diversity: Ecologists’ contribution

– Example: 8 cats, 2 tigers• The probability of choosing a cat (p1) = 8/10 = 0.8• The probability of choosing a cat AGAIN = 0.8 * 0.8 =

0.64• The probability of choosing a tiger(p2) = 2/10 = 0.2• The probability of choosing a tiger AGAIN = 0.2 * 0.2 =

0.04• What is the probability of choosing two different animals?

Page 15: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

15

Decision Tree

• P = 1 - p1*p1 - p2*p2 = 0.32

• When is p biggest? --> when cats number = tiger number (p1 = p2 = 0.5)P = 0.5

• When is p smallest? -->only one kind of animal (p1 or p2=1)p = 0

• Gini index could represent the diversity of the data set

Page 16: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

16

Decision Tree

• Gini Index: Suppose there are n different output classes, each class has a probability of pn to appear, the Gini Index is:1 -

• When there are only two classes:

1- p1*p1 - p2*p2 = 1 - p1*p1 - (1-p1) *(1-p1)= 2p1 (1 - p1)

2ip

Page 17: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

17

Decision Tree• Another Index: Entropy

E = -

• When only two categories:

E = - (p1 log2p1 + p2 log2 p2)

• The bigger E is, the more diverse the data is

n

xxx pp

12log

Page 18: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

18

Decision Tree

• Practice? – Question 1: if the chance of churn is 0.5 and

not churn is – 0.5, what is the Entropy?– Answer: - 2 * ( 0.5 * log 2 0.5) = 1

– Question 2: if the chance of churn is 0.25 and not churn is

– 0.75, what is the Entropy?– Answer: - ( 0.25 * log 2 0.25 + 0.75 * log 2

0.75 )

Page 19: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

19

Decision Tree

• What is the entropy of set 1? – - [5/6 * log2 (5/6) + 1/6 * log2 (1/6)]

• Set 2?• The whole Set?• The reduction?

Income

Age

Set 1

Set 2

Page 20: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

20

Decision Tree

• Calculation of the reduction in Entropy– Original E: Easy to get– E of the subsets: easy to get– How to deal with the E’s of the subsets?

• Simply add them together is not good, since their sizes are different

– Use a weighted sum:• w1 = # of records in subset 1 / total # of records• w2 = # of records in subset 2 / total # of records

• E’ = w1 * E1 + w2 * E2

Page 21: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

21

Decision Tree

• The algorithm (divide and conquer):– Select an attribute and partition a data set D

into D1, D2 … Dn– Calculate the Entropy En for each of the data set

– Get the E’ =

– Get E of the data set before partition– Get reduction in Entropy = E - E’– Divide the data set using the attribute with the

largest Entropy reduction; go to next round

n

xxx Ew

1

Page 22: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

22

Decision Tree

Income

Age

Income = 23K

Age = 55

Page 23: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

23

Decision Tree

• Partition by Age = 55: – E = -[9/16 * log (9/16) + 7/16 * log (7/16)]

– E1= - [5/6 * log2 (5/6) + 1/6 * log2 (1/6)]– E2= -[ 0.4 * log2 (0.4) + 0.6 * log2 (0.6)]

– E’ = 6/16 * E1 + 10/16 * E2

– Reduction = E – E’

• Partition by Income = 23k : Similar

Page 24: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

24

Decision Tree

Age?

Income? Churn

<= 55 > 55

<= 23K

Churn Not Churn

Page 25: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

25

Extract rules from the model

• Each path from the root to a leaf node forms a IF-THEN rule.

• In this rule, root and internal nodes are conjuncted to form the IF part.

• Left node denotes the THEN part of the rule.

Page 26: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

26

Pruning

• Noises: inconsistent class labels for the same attribute values

• Outliers: the # of samples with a given combinations of class labels and input attribute values is small

• Overfitting: tree branches are created to classify noises and outliers

• Problem: unreliable tree branches

Page 27: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

27

Pruning

• Function: remove unreliable branches• Pre-pruning

– Halting creations of unreliable branches by statistically determine the goodness of further tree splits

– Less time-consuming but less effective

• Post-pruning– Remove unreliable branches from a full tree– Minimizing error rates or required encoding

bits– More time-consuming but more effective

Page 28: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

28

Decision Tree

• Pros of Decision Trees: – Clear Rules– Fast Algorithm– Scalable

• Cons:– The accuracy may suffer with complex

problems, e.g., a large number of class labels

Page 29: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

29

Decision Tree

• Many Trees out there!!– ID3– C4.5 --- continuous predictor values– CART– Forest– MDTI– ….

Page 30: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

30

Neural Networks

• What is it? – Another classification technique to

map from input attribute(s) to output attribute(s)

– Most widely known but least understood

• Human Brains: The root of neural network +

-

?

Page 31: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

31

Neural Networks

i1

i2

H2

H1

O1

O3

O2

Page 32: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

32

Neural Network

• Let’s start with a simple example:

• z = 3x + 2y + 1• Input Attributes: x, y Output

Attribute: z

• How to represent the mapping?

Page 33: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

33

Neural Network

x

y

Input nodes

Input layer Output layer

Output nodes

3

2

Weights

(SUM)

Combination Function

+1

Transfer Function

Page 34: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

34

Two-layer Neural Network• Three Major components:

– Input layer– Output layer– A weight matrix in between

• Three Functions:– Combination function: Usually sum– Transfer or activation Function: To “Squash

(normalize)” the sum to a certain range

• Can represent ANY linear functions between the input space and output space

Page 35: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

35

Neural Network

x

y

Input nodes

Input layer Output layer

Output nodes

3

2

Weights

SUM

Combination Function

sigmoid

Transfer Function

Page 36: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

36

Neural Networks

• How about non - linear relationships?• Throw in another layer: Hidden layer

• Theoretically, a neural net with above structure can represent ANY function between the input space and output space

i1

i2

H2

H1

O1w111

w122

w12

1

w11

2

O3

O2

Page 37: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

37

Neural Networks

• Data Flow:

i1

i2

H2

H1

O1w111

w122

w12

1

w11

2

O2

Age

Income

sum

S(H2)

S(H1)

sum

sum

S(O1)

S(O2)

Churn

Not Churn

Page 38: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

38

Neural Network

• Feed-forward: – The above process, in which the input values are

transformed through the network to produce the output values, is called FEED-FORWARD

• When we get new records, we do feed-forward to get the prediction values.

• But how do we produce a network that can predict?

Page 39: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

39

Neural Nets

Data set

Training Set

Testing Set

Initial Neural Net

Training

Trained Neural Net

Testing

Trained Net with

Performance Measurement

Page 40: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

40

Neural Net• Split the Data Set:

– classifier and error: • 2/3 for training, 1/3 for testing

– Ten-fold validation: • 9/10 for training, 1/10 for testing. Repeat ten times • When sample size is small, use this

Page 41: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

41

Training Neural Net

• Step 1. Set up an initial neural net:– input, output, and hidden layer nodes, value to 0.– Weight matrix often set to random small values (-0.5, 0.5)

• Step 2. Feed-forward:– Use historical data, run the predictor values through,

get output.

initialize

Feed-forward : guess Back-

propagation: Learn

Page 42: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

42

Neural Net

• Step 3: Back-propagation– Critical Step: learning happens here– Compare the result of machine with the

historical result:• error i = oi real - oi machine• Based on this error, go BACK to the hidden -- output

layer matrix, change the weights so that error could be smaller

• Requires calculus (derivatives of the error)• Just interpret it as looking for the minimum error on

an error surface• Repeat the process until the error falls within an

acceptable range

Page 43: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

43

Neural Net• Tuning for the training phase

– Topology: number of input, output, and hidden nodes

• hidden = 1/2 (output + input)• number of hidden layers: 1 is enough

– Learning Rate (0-1): • The rate at which weights can be modified from previous

weights• Very important for learning convergence and performance

– Momentum: • The adjustment to be included to calculate weight

modifications• Typically very small or zero. Less important

Page 44: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

44

Neural Net

• Pros: Very Powerful (ANY function!)• Cons:

• Time - consuming• Black-Box

• When and Where to use it:• Complicated prediction problems• Visualization or understanding of the rules are not

needed • Accuracy is very important

Page 45: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

45

Summary• Basics

– Classification versus prediction• Mappings from input attributes to class labels• Data types of input attributes and class labels: numeric,

categorical and ordinal• Data-type-based view and discovery-vs-predictive view

• Decision-tree induction method– Recursive partitions of the data sets to increase the

purity (or information gain) level of class labels in individual partitions.

– Entropy function: measure of diversity– Tree nodes correspond to partitions and links

correspond to partitioning conditions– Pre-pruning or post-pruning remove unreliable tree

branches caused by noises or outliers

Page 46: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

46

Summary

• Neural Net– Neural net has the following components:

• input layer, output layer, hidden layer• weight matrices

– Input layer represents the input attributes– Output layer represents the output classes– Hidden layer and the matrices helps to

capture the mapping function

Page 47: 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair

47

Summary

• Neural Net– To use a neural net, go through three steps:

• Training: feed-forward, back-propagation• Testing: feed-forward only, used to measure the

accuracy of the model built• Prediction: Feed-forward without testing the

performance

– Most of the tuning occurs in the training phase• hidden layer node number• learning rate• Momentum

• Readings: T2, Ch. 7.1 – 7.3.3 and Ch. 7.2