business intelligence and decision support systems (9th ed...
TRANSCRIPT
Business Intelligence and Decision Support Systems
(9th Ed., Prentice Hall)
Chapter 6: Artificial Neural Networks
for Data Mining
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-2
Learning Objectives
n Understand the concept and definitions of artificial neural networks (ANN)
n Know the similarities and differences between biological and artificial neural networks
n Learn the different types of neural network architectures
n Learn the advantages and limitations of ANN n Understand how backpropagation learning
works in feedforward neural networks
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-3
Learning Objectives
n Understand the step-by-step process of how to use neural networks
n Appreciate the wide variety of applications of neural networks; solving problem types of n Classification n Regression n Clustering n Association n Optimization
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-4
Opening Vignette:
“Predicting Gambling Referenda with Neural Networks”
n Decision situation n Proposed solution n Results n Answer and discuss the case questions
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-5
Opening Vignette: Predicting Gambling Referenda…
INPUTLAYER
HIDDENLAYER
OUTPUTLAYER
......
Voted “yes” or “no” to legalizing gaming
Predicted vs. Actual=
Socio-demographic
Religious
Financial
Other
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-6
Neural Network Concepts
n Neural networks (NN): a brain metaphor for information processing
n Neural computing n Artificial neural network (ANN) n Many uses for ANN for
n pattern recognition, forecasting, prediction, and classification
n Many application areas n finance, marketing, manufacturing, operations,
information systems, and so on
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-7
Biological Neural Networks
Soma
Axon
Axon
SynapseSynapse
Dendrites
Dendrites Soma
n Two interconnected brain cells (neurons)
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-8
Processing Information in ANN
w1
w2
wn
x1
x2
xn
...Y
Y1
Yn
Y2
Inputs Weights Outputs
...
Neuron (or PE)
∑=
=n
iiiWXS
1
)( Sf
Summation TransferFunction
n A single neuron (processing element – PE) with inputs and outputs
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-9
Biology Analogy
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-10
Elements of ANN
n Processing element (PE) n Network architecture
n Hidden layers n Parallel processing
n Network information processing n Inputs n Outputs n Connection weights n Summation function
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-11
Elements of ANN
(PE)
(PE)
(PE)
(PE)
(PE)
(PE)
(PE)
TransferFunction
( f )
WeightedSum(S )
x1x2x3
Y1
InputLayer
Hidden Layer
Output Layer
Neural Network with One Hidden Layer
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-12
Elements of ANN
x1
x2 2211 WXWXY +=
(PE)
(PE)
Y
(PE)
(PE)
w1
w1
w11
w21
w12
w22
w23
x1
x2
Y1
Y2
Y3
2121111 WXWXY +=
2221212 WXWXY +=
2323 WXY =
(a) Single neuron (b) Multiple neurons
PE: Processing Element (or neuron)
Summation Function for a Single Neuron (a) and Several Neurons (b)
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-13
Elements of ANN
n Transformation (Transfer) Function n Linear function n Sigmoid (logical activation) function [0 1] n Tangent Hyperbolic function [-1 1]
X1 = 3
Processing element (PE)X2 = 1
X3 = 2
W1 = 0.2
W2 = 0.4
W3 = 0.1
Y = 1.2
Summation function:
Transfer function:
Y = 3(0.2) + 1(0.4) + 2(0.1) = 1.2
YT = 1/(1 + e-1.2) = 0.77
YT = 0.77
v Threshold value?
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-14
Neural Network Architectures
n Several ANN architectures exist n Feedforward n Recurrent n Associative memory n Probabilistic n Self-organizing feature maps n Hopfield networks n … many more …
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-15
Neural Network Architectures Recurrent Neural Networks
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-16
Neural Network Architectures
n Architecture of a neural network is driven by the task it is intended to address n Classification, regression, clustering, general
optimization, association, ….
n Most popular architecture: Feedforward, multi-layered perceptron with backpropagation learning algorithm n Used for both classification and regression type
problems
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-17
Learning in ANN
n A process by which a neural network learns the underlying relationship between input and outputs, or just among the inputs
n Supervised learning n For prediction type problems n E.g., backpropagation
n Unsupervised learning n For clustering type problems n Self-organizing n E.g., adaptive resonance theory
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-18
A Taxonomy of ANN Learning Algorithms
Learning Algorithms
Discrete/binary input Continuous Input
Surepvised Unsupervised
·∙ Delta rule·∙ Gradient Descent·∙ Competitive learning·∙ Neocognitron·∙ Perceptor
·∙ Simple Hopefield·∙ Outerproduct AM·∙ Hamming Net
·∙ ART-1·∙ Carpenter /
Grossberg
·∙ ART-3·∙ SOFM (or SOM)·∙ Other clustering
algorithms
Architectures
Supervised Unsupervised
Recurrent Feedforward Extimator Extractor
·∙ Hopefield ·∙ SOFM (or SOM)·∙ Nonlinear vs. linear·∙ Backpropagation·∙ ML perceptron·∙ Boltzmann
·∙ ART-1·∙ ART-2
UnsupervisedSurepvised
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-19
A Supervised Learning Process
Compute output
Is desiredoutput
achieved?
Stoplearning
Adjustweights
Yes
No
ANNModel Three-step process:
1. Compute temporary outputs
2. Compare outputs with desired targets
3. Adjust the weights and repeat the process
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-20
How a Network Learns
n Example: single neuron that learns the inclusive OR operation
* See your book for step-by-step progression of the learning process
Learning parameters: n Learning rate n Momentum
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-21
Backpropagation Learning
n Backpropagation of Error for a Single Neuron
w1
w2
wn
x1
x2
xn
...Yi
Neuron (or PE)
∑=
=n
iiiWXS
1
)( Sf
Summation TransferFunction
)(SfY =
a (Zi – Yi)error
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-22
Backpropagation Learning
n The learning algorithm procedure: 1. Initialize weights with random values and set
other network parameters 2. Read in the inputs and the desired outputs 3. Compute the actual output (by working forward
through the layers) 4. Compute the error (difference between the actual
and desired output) 5. Change the weights by working backward through
the hidden layers 6. Repeat steps 2-5 until weights stabilize
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-23
Development Process of an ANN
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-24
An MLP ANN Structure for the Box-Office Prediction Problem
1
2
3
4
5
6
7...
1
2
3
4
5
6
7
8
9
...
MPAA Rating (5)(G, PG, PG13, R, NR)
Competition (3)(High, Medium, Low)
Star Value (3)(High, Medium, Low)
Genre (10)(Sci-Fi, Action, ... )
Technical Effects (3)(High, Medium, Low)
Sequel (2)(Yes, No)
Number of Screens (Positive Integer)
Class 1 - FLOP(BO < 1 M)
Class 2(1M < BO < 10M)
Class 3(10M < BO < 20M)
Class 4(20M < BO < 40M)
Class 5(40M < BO < 65M)
Class 6(65M < BO < 100M)
Class 7(100M < BO < 150M)
Class 8(150M < BO < 200M)
Class 9 - BLOCKBUSTER(BO > 200M)
INPUTLAYER
(27 PEs)
HIDDENLAYER I(18 PEs)
HIDDENLAYER II(16 PEs)
OUTPUTLAYER(9 PEs)
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-25
Testing a Trained ANN Model
n Data is split into three parts n Training (~60%) n Validation (~20%) n Testing (~20%)
n k-fold cross validation n Less bias n Time consuming
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-26
Sensitivity Analysis on ANN Models
n A common criticism for ANN: The lack of expandability
n The black-box syndrome! n Answer: sensitivity analysis
n Conducted on a trained ANN n The inputs are perturbed while the relative
change on the output is measured/recorded n Results illustrates the relative importance of
input variables
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-27
Sensitivity Analysis on ANN Models
D1
Systematically Perturbed
InputsObserved Change in Outputs
Trained ANN“the black-box”
n For a good example, see Application Case 6.5 n Sensitivity analysis reveals the most important
injury severity factors in traffic accidents
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-28
A Sample Neural Network Project Bankruptcy Prediction
n A comparative analysis of ANN versus logistic regression (a statistical method)
n Inputs n X1: Working capital/total assets n X2: Retained earnings/total assets n X3: Earnings before interest and taxes/
total assets n X4: Market value of equity/total debt n X5: Sales/total assets
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-29
A Sample Neural Network Project Bankruptcy Prediction
n Data was obtained from Moody's Industrial Manuals n Time period: 1975 to 1982 n 129 firms (65 of which went bankrupt
during the period and 64 nonbankrupt)
n Different training and testing propositions are used/compared n 90/10 versus 80/20 versus 50/50 n Resampling is used to create 60 data sets
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-30
A Sample Neural Network Project Bankruptcy Prediction
x1
x2
x3
x4
x5
BR = 1
NBR = 1
Network Specifics § Feedforward MLP § Backpropagation § Varying learning and
momentum values § 5 input neurons (1 for
each financial ratio), § 10 hidden neurons, § 2 output neurons (1
indicating a bankrupt firm and the other indicating a nonbankrupt firm)
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-31
A Sample Neural Network Project Bankruptcy Prediction - Results
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-32
Other Popular ANN Paradigms Self Organizing Maps (SOM)
Input 1
Input 2
Input 3
§ First introduced by the Finnish Professor Teuvo Kohonen
§ Applies to clustering type problems
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-33
Other Popular ANN Paradigms Self Organizing Maps (SOM)
n SOM Algorithm – 1. Initialize each node's weights 2. Present a randomly selected input vector to
the lattice 3. Determine most resembling (winning) node 4. Determine the neighboring nodes 5. Adjusted the winning and neighboring nodes
(make them more like the input vector) 6. Repeat steps 2-5 for until a stopping criteria
is reached
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-34
Other Popular ANN Paradigms Self Organizing Maps (SOM)
n Applications of SOM n Customer segmentation n Bibliographic classification n Image-browsing systems n Medical diagnosis n Interpretation of seismic activity n Speech recognition n Data compression n Environmental modeling, many more …
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-35
Other Popular ANN Paradigms Hopfield Networks
§ First introduced by John Hopfield
§ Highly interconnected neurons
§ Applies to solving complex computational problems (e.g., optimization problems)
. . .
I n p u t
Output
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-36
Applications Types of ANN
n Classification n Feedforward networks (MLP), radial basis function,
and probabilistic NN
n Regression n Feedforward networks (MLP), radial basis function
n Clustering n Adaptive Resonance Theory (ART) and SOM
n Association n Hopfield networks
n Provide examples for each type?
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-37
Advantages of ANN
n Able to deal with (identify/model) highly nonlinear relationships
n Not prone to restricting normality and/or independence assumptions
n Can handle variety of problem types n Usually provides better results (prediction
and/or clustering) compared to its statistical counterparts
n Handles both numerical and categorical variables (transformation needed!)
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-38
Disadvantages of ANN
n They are deemed to be black-box solutions, lacking expandability
n It is hard to find optimal values for large number of network parameters n Optimal design is still an art: requires expertise
and extensive experimentation
n It is hard to handle large number of variables (especially the rich nominal attributes)
n Training may take a long time for large datasets; which may require case sampling
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-39
ANN Software
n Standalone ANN software tool n NeuroSolutions n BrainMaker n NeuralWare n NeuroShell, … for more (see pcai.com) …
n Part of a data mining software suit n PASW (formerly SPSS Clementine) n SAS Enterprise Miner n Statistica Data Miner, … many more …
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-40
End of the Chapter
n Questions / comments…
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-41
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic,
mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America.
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall