business intelligence and decision support systems (9th ed...

Business Intelligence and Decision Support Systems

(9th Ed., Prentice Hall)

Chapter 6: Artificial Neural Networks

for Data Mining

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-2

Learning Objectives

n  Understand the concept and definitions of artificial neural networks (ANN)

n  Know the similarities and differences between biological and artificial neural networks

n  Learn the different types of neural network architectures

n  Learn the advantages and limitations of ANN n  Understand how backpropagation learning

works in feedforward neural networks


Learning Objectives

n  Understand the step-by-step process of how to use neural networks

n  Appreciate the wide variety of applications of neural networks; solving problem types of n  Classification n  Regression n  Clustering n  Association n  Optimization


Opening Vignette:

“Predicting Gambling Referenda with Neural Networks”

n Decision situation n Proposed solution n Results n Answer and discuss the case questions


Opening Vignette: Predicting Gambling Referenda…

INPUTLAYER

HIDDENLAYER

OUTPUTLAYER

......

Voted “yes” or “no” to legalizing gaming

Predicted vs. Actual=

Socio-demographic

Religious

Financial

Other


Neural Network Concepts

n  Neural networks (NN): a brain metaphor for information processing

n  Neural computing n  Artificial neural network (ANN) n  Many uses for ANN for

n  pattern recognition, forecasting, prediction, and classification

n  Many application areas n  finance, marketing, manufacturing, operations,

information systems, and so on


Biological Neural Networks

Soma

Axon

Axon

SynapseSynapse

Dendrites

Dendrites Soma

n  Two interconnected brain cells (neurons)


Processing Information in ANN

w1

w2

wn

x1

x2

xn

...Y

Y1

Yn

Y2

Inputs Weights Outputs

...

Neuron (or PE)

∑=

=n

iiiWXS

1

)( Sf

Summation TransferFunction

n  A single neuron (processing element – PE) with inputs and outputs


Biology Analogy


Elements of ANN

n  Processing element (PE) n  Network architecture

n  Hidden layers n  Parallel processing

n  Network information processing n  Inputs n  Outputs n  Connection weights n  Summation function


Elements of ANN

(PE)

(PE)

(PE)

(PE)

(PE)

(PE)

(PE)

TransferFunction

( f )

WeightedSum(S )

x1x2x3

Y1

InputLayer

Hidden Layer

Output Layer

Neural Network with One Hidden Layer


Elements of ANN

x1

x2 2211 WXWXY +=

(PE)

(PE)

Y

(PE)

(PE)

w1

w1

w11

w21

w12

w22

w23

x1

x2

Y1

Y2

Y3

2121111 WXWXY +=

2221212 WXWXY +=

2323 WXY =

(a) Single neuron (b) Multiple neurons

PE: Processing Element (or neuron)

Summation Function for a Single Neuron (a) and Several Neurons (b)


Elements of ANN

n  Transformation (Transfer) Function n  Linear function n  Sigmoid (logical activation) function [0 1] n  Tangent Hyperbolic function [-1 1]

X1 = 3

Processing element (PE)X2 = 1

X3 = 2

W1 = 0.2

W2 = 0.4

W3 = 0.1

Y = 1.2

Summation function:

Transfer function:

Y = 3(0.2) + 1(0.4) + 2(0.1) = 1.2

YT = 1/(1 + e-1.2) = 0.77

YT = 0.77

v  Threshold value?


Neural Network Architectures

n  Several ANN architectures exist n  Feedforward n  Recurrent n  Associative memory n  Probabilistic n  Self-organizing feature maps n  Hopfield networks n  … many more …


Neural Network Architectures Recurrent Neural Networks


Neural Network Architectures

n  Architecture of a neural network is driven by the task it is intended to address n  Classification, regression, clustering, general

optimization, association, ….

n  Most popular architecture: Feedforward, multi-layered perceptron with backpropagation learning algorithm n  Used for both classification and regression type

problems


Learning in ANN

n  A process by which a neural network learns the underlying relationship between input and outputs, or just among the inputs

n  Supervised learning n  For prediction type problems n  E.g., backpropagation

n  Unsupervised learning n  For clustering type problems n  Self-organizing n  E.g., adaptive resonance theory


A Taxonomy of ANN Learning Algorithms

Learning Algorithms

Discrete/binary input Continuous Input

Surepvised Unsupervised

·∙ Delta rule·∙ Gradient Descent·∙ Competitive learning·∙ Neocognitron·∙ Perceptor

·∙ Simple Hopefield·∙ Outerproduct AM·∙ Hamming Net

·∙ ART-1·∙ Carpenter /

Grossberg

·∙ ART-3·∙ SOFM (or SOM)·∙ Other clustering

algorithms

Architectures

Supervised Unsupervised

Recurrent Feedforward Extimator Extractor

·∙ Hopefield ·∙ SOFM (or SOM)·∙ Nonlinear vs. linear·∙ Backpropagation·∙ ML perceptron·∙ Boltzmann

·∙ ART-1·∙ ART-2

UnsupervisedSurepvised


A Supervised Learning Process

Compute output

Is desiredoutput

achieved?

Stoplearning

Adjustweights

Yes

No

ANNModel Three-step process:

1. Compute temporary outputs

2. Compare outputs with desired targets

3. Adjust the weights and repeat the process


How a Network Learns

n  Example: single neuron that learns the inclusive OR operation

* See your book for step-by-step progression of the learning process

Learning parameters: n  Learning rate n  Momentum


Backpropagation Learning

n  Backpropagation of Error for a Single Neuron

w1

w2

wn

x1

x2

xn

...Yi

Neuron (or PE)

∑=

=n

iiiWXS

1

)( Sf

Summation TransferFunction

)(SfY =

a (Zi – Yi)error


Backpropagation Learning

n  The learning algorithm procedure: 1.  Initialize weights with random values and set

other network parameters 2.  Read in the inputs and the desired outputs 3.  Compute the actual output (by working forward

through the layers) 4.  Compute the error (difference between the actual

and desired output) 5.  Change the weights by working backward through

the hidden layers 6.  Repeat steps 2-5 until weights stabilize


Development Process of an ANN


An MLP ANN Structure for the Box-Office Prediction Problem

1

2

3

4

5

6

7...

1

2

3

4

5

6

7

8

9

...

MPAA Rating (5)(G, PG, PG13, R, NR)

Competition (3)(High, Medium, Low)

Star Value (3)(High, Medium, Low)

Genre (10)(Sci-Fi, Action, ... )

Technical Effects (3)(High, Medium, Low)

Sequel (2)(Yes, No)

Number of Screens (Positive Integer)

Class 1 - FLOP(BO < 1 M)

Class 2(1M < BO < 10M)

Class 3(10M < BO < 20M)

Class 4(20M < BO < 40M)

Class 5(40M < BO < 65M)

Class 6(65M < BO < 100M)

Class 7(100M < BO < 150M)

Class 8(150M < BO < 200M)

Class 9 - BLOCKBUSTER(BO > 200M)

INPUTLAYER

(27 PEs)

HIDDENLAYER I(18 PEs)

HIDDENLAYER II(16 PEs)

OUTPUTLAYER(9 PEs)


Testing a Trained ANN Model

n  Data is split into three parts n  Training (~60%) n  Validation (~20%) n  Testing (~20%)

n  k-fold cross validation n  Less bias n  Time consuming


Sensitivity Analysis on ANN Models

n  A common criticism for ANN: The lack of expandability

n  The black-box syndrome! n  Answer: sensitivity analysis

n  Conducted on a trained ANN n  The inputs are perturbed while the relative

change on the output is measured/recorded n  Results illustrates the relative importance of

input variables


Sensitivity Analysis on ANN Models

D1

Systematically Perturbed

InputsObserved Change in Outputs

Trained ANN“the black-box”

n  For a good example, see Application Case 6.5 n  Sensitivity analysis reveals the most important

injury severity factors in traffic accidents


A Sample Neural Network Project Bankruptcy Prediction

n  A comparative analysis of ANN versus logistic regression (a statistical method)

n  Inputs n  X1: Working capital/total assets n  X2: Retained earnings/total assets n  X3: Earnings before interest and taxes/

total assets n  X4: Market value of equity/total debt n  X5: Sales/total assets



n  Data was obtained from Moody's Industrial Manuals n  Time period: 1975 to 1982 n  129 firms (65 of which went bankrupt

during the period and 64 nonbankrupt)

n  Different training and testing propositions are used/compared n  90/10 versus 80/20 versus 50/50 n  Resampling is used to create 60 data sets



x1

x2

x3

x4

x5

BR = 1

NBR = 1

Network Specifics §  Feedforward MLP §  Backpropagation §  Varying learning and

momentum values §  5 input neurons (1 for

each financial ratio), §  10 hidden neurons, §  2 output neurons (1

indicating a bankrupt firm and the other indicating a nonbankrupt firm)


A Sample Neural Network Project Bankruptcy Prediction - Results


Other Popular ANN Paradigms Self Organizing Maps (SOM)

Input 1

Input 2

Input 3

§  First introduced by the Finnish Professor Teuvo Kohonen

§  Applies to clustering type problems



n  SOM Algorithm – 1.  Initialize each node's weights 2.  Present a randomly selected input vector to

the lattice 3.  Determine most resembling (winning) node 4.  Determine the neighboring nodes 5.  Adjusted the winning and neighboring nodes

(make them more like the input vector) 6.  Repeat steps 2-5 for until a stopping criteria

is reached



n  Applications of SOM n  Customer segmentation n  Bibliographic classification n  Image-browsing systems n  Medical diagnosis n  Interpretation of seismic activity n  Speech recognition n  Data compression n  Environmental modeling, many more …


Other Popular ANN Paradigms Hopfield Networks

§  First introduced by John Hopfield

§  Highly interconnected neurons

§  Applies to solving complex computational problems (e.g., optimization problems)

. . .

I n p u t

Output


Applications Types of ANN

n  Classification n  Feedforward networks (MLP), radial basis function,

and probabilistic NN

n  Regression n  Feedforward networks (MLP), radial basis function

n  Clustering n  Adaptive Resonance Theory (ART) and SOM

n  Association n  Hopfield networks

n  Provide examples for each type?


Advantages of ANN

n  Able to deal with (identify/model) highly nonlinear relationships

n  Not prone to restricting normality and/or independence assumptions

n  Can handle variety of problem types n  Usually provides better results (prediction

and/or clustering) compared to its statistical counterparts

n  Handles both numerical and categorical variables (transformation needed!)


Disadvantages of ANN

n  They are deemed to be black-box solutions, lacking expandability

n  It is hard to find optimal values for large number of network parameters n  Optimal design is still an art: requires expertise

and extensive experimentation

n  It is hard to handle large number of variables (especially the rich nominal attributes)

n  Training may take a long time for large datasets; which may require case sampling


ANN Software

n  Standalone ANN software tool n  NeuroSolutions n  BrainMaker n  NeuralWare n  NeuroShell, … for more (see pcai.com) …

n  Part of a data mining software suit n  PASW (formerly SPSS Clementine) n  SAS Enterprise Miner n  Statistica Data Miner, … many more …


End of the Chapter

n  Questions / comments…


All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic,

mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America.

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

business intelligence and decision support systems (9th ed...

Documents