introduction to neural networks (under graduate course) lecture 9 of 9

Neural Networks

Dr. Randa Elanwar

Lecture 9

Lecture Content

• Mapping networks:

– Back-propagation neural network

– Self-organizing map

– Counter propagation network

• Spatiotemporal Network

• Stochastic Networks

– Boltzmann machine

• Neurocognition network

2Neural Networks Dr. Randa Elanwar

Mapping networks

• When the problem is non linear and no straight line could ever separate samples in the feature space we need multilayer perceptrons (having hidden layer’s’) to achieve nonlinearity.

• The idea is that we map/transform/translate our data to another feature space that is linearly separable. Thus we call them mapping networks.

• We will discuss three types of mapping networks: the back-propagation neural network, self-organizing map, counter propagation network.


Mapping networks• Networks without hidden units are very limited in the input-output

mappings they can model.

– More layers of linear units do not help. Its still linear.

– Fixed output non-linearities are not enough

• We need multiple layers of adaptive non-linear hidden units.

• But how can we train such nets?

– We need an efficient way of adapting all the weights, not just the last layer. i.e., Learning the weights going into hidden units . This is hard.

– Why?

– Because: Nobody is telling us directly what hidden units should do.

– Solution: This can be achieved using ‘Backpropagation’ learning


Learning with hidden layers

• Mathematically, the learning process is an optimization problem. We initiate the NN system with some parameters (weights) and use known examples to find out the optimal values of such weights.

• Generally, the solution of an optimization problem is to find the parameter value that leads to minimum value of an optimization function.


G(t)

t

In our case, the optimization function that we need to minimize to get the final weights is the error function

E = ydes-yact

E = ydes-f(W.X)

To get the minimum value mathematically we differentiate the error function with respect to the parameter we need to get we call it W

E

Learning with hidden layers

• We define the “gradient”: w = . . X

• If is +ve this means that the current values of W makes the differentiation result +ve which is wrong. We want differentiation result to be = 0 (minimum point) we must move in the opposite direction of the gradient (subtract). The opposite is also true.

• If is =0 this means that the current values of W makes the differentiation result = 0 which is right. These weights are the optimal values (solution) and w should stop the algorithm. The network now is trained and ready for use.


The back propagation algorithm

• The backpropagation learning algorithm can be divided into two phases: propagation and weight update.


Phase 1: Propagation

1.Forward propagation of a training pattern's input through the neural network in order to generate the propagation's output activations (yact).

2.Backward propagation of the propagation's output activations through the neural network using the training pattern's target (ydes) in order to generate the deltas () of all output and hidden neurons.

Phase 2: Weight update

For each weight follow the following steps:

1.Multiply its output delta () and input activation (x) and the learning rate () to get the gradient of the weight (w).

2.Bring the weight in the opposite direction of the gradient by subtracting it from the weight.

- The sign of the gradient of a weight indicates where the error is increasing, this is why the weight must be updated in the opposite direction.

- Repeat phase 1 and 2 until the performance of the network is satisfactory.

Backpropagation Networks

• They are the nonlinear (mapping) neural networks using the backpropagation supervised learning technique.

• Modes of learning of nonlinear nets:

• There are three modes of learning to choose from: on-line (pattern), batch and stochastic.

• In on-line and stochastic learning, each propagation is followed immediately by a weight update.

• In batch learning, many propagations occur before updating the weights.

• Batch learning requires more memory capacity, but on-line and stochastic learning require more updates.



• On-line learning is used for dynamic environments that provide a continuous stream of new patterns.

• Stochastic learning and batch learning both make use of a training set of static patterns. Stochastic goes through the data set in a random order in order to reduce its chances of getting stuck in local minima.

• Stochastic learning is also much faster than batch learning since weights are updated immediately after each propagation. Yet batch learning will yield a much more stable descent to a local minima since each update is performed based on all patterns.



• Applications of supervised learning (Backpropagation NN) include

• Pattern recognition

• Credit approval

• Target marketing

• Medical diagnosis

• Defective parts identification in manufacturing

• Crime zoning

• Treatment effectiveness analysis

• Etc


Self-organizing map• We can also train networks where there is no teacher. This is called

unsupervised learning. The network learns a prototype based on the distribution of patterns in the training data. Such networks allow us to:– Discover underlying structure of the data – Encode or compress the data – Transform the data

• Self-organizing maps (SOMs) are a data visualization technique invented by Professor Teuvo Kohonen– Also called Kohonen Networks, Competitive Learning, Winner-Take-All

Learning

– Generally reduces the dimensions of data through the use of self-organizing neural networks

– Useful for data visualization; humans cannot visualize high dimensional data so this is often a useful technique to make sense of large data sets


Self-organizing map

• SOM structure:

1. Weights in neuron must represent a class of pattern. We have a neuron for each class.

2. Inputs pattern presented to all neurons and each produces an output. Output: measure of the match between input pattern and pattern stored by neuron.

3. A competitive learning strategy selects neuron with largest response.

4. A method of reinforcing the largest response.


Self-organizing map

• Unsupervised classification learning is based on clustering of input data. No a priori knowledge is about an input’s membership in a particular class.

• Instead, gradually detected characteristics and a history of training will be used to assist the network in defining classes and possible boundaries between them.

• Clustering is understood to be the grouping of similar objects and separating of dissimilar ones.

• We discuss Kohonen’s network which classifies input vectors into one of the specified number of m categories, according to the clusters detected in the training set


Kohonen’s Network


Kohonen network

X

•The Kohonen network is a self-organising network with the following characteristics:1. Neurons are arranged on a 2D grid2. Inputs are sent to all neurons3. There are no connections between

neurons4. For a neuron output (j) is a weighted

sum of multiplication of x and w vectors, where x is the input, w is the weights

5. There is no threshold or bias6. Input values and weights are

normalized

Self-organizing map

Learning in Kohonen networks:

• Initially the weights in each neuron are random

• Input values are sent to all the neurons

• The outputs of each neuron are compared

• The “winner” is the neuron with the largest output value

• Having found the winner, the weights of the winning neuron are adjusted

• Weights of neurons in a surrounding neighbourhood are also adjusted

• As training progresses the neighbourhood gets smaller

• Weights are adjusted according to the following formula:


Self-organizing map

• The learning coefficient (alpha) starts with a value of 1 and gradually reduces to 0

• This has the effect of making big changes to the weights initially, but no changes at the end

• The weights are adjusted so that they more closely resemble the input patterns

Applications of unsupervised learning (Kohonen’s NN) include

• Clustering

• Vector quantization

• Data compression

• Feature extraction


Counter propagation network

• The counterpropagation network (CPN) is a fast-learning combination of unsupervised and supervised learning.

• Although this network uses linear neurons, it can learn nonlinearfunctions by means of a hidden layer of competitive units.

• Moreover, the network is able to learn a function and its inverseat the same time.

• However, to simplify things, we will only consider the feedforward mechanism of the CPN.



• Training:

1.Randomly select a vector pair (x, y) from the training set.

2.Measure the similarity between the input vector and the activation of the hidden-layer units.

3.In the hidden (competitive) layer, determine the unit with the largest activation (the winner). I.e., the neuron whose weight vector is most similar to the current input vector is the “winner.”

4.Adjust the connection weights inbetween

5.Repeat until each input pattern is consistently associated with the same competitive unit.



• After the first phase of the training, each hidden-layer neuron is associated with a subset of input vectors (class of patterns).

• In the second phase of the training, we adjust the weights in the network’s output layer in such a way that, for any winning hidden-layer unit, the network’s output is as close as possible to the desired output for the winning unit’s associated input vectors.

• The idea is that when we later use the network to compute functions, the output of the winning hidden-layer unit is 1, and the output of all other hidden-layer units is 0.


Spatiotemporal Networks•A spatio-temporal neural net differs from other neural networks in two ways:

1. Neurons has recurrent links that have different propagation delays

2. The state of the network depends not only on which nodes are firing, but also on the relative firing times of nodes. i.e., the significance of a node varies with time and depends on the firing state of other nodes.

•The use of recurrence and multiple links with variable propagation delays provides a rich mechanism for feature extraction and pattern recognition:

1. Recurrent links enable nodes to integrate and differentiate inputs. I.e., detect features

2. multiple links with variable propagation delays between nodes serve as a short-term memory.


Spatiotemporal Networks

• Applications:

• Problems such as speech recognition and time series prediction where the

input signal has an explicit temporal aspect.

• Tasks like image recognition do not have an explicit temporal aspect, but

can also be done by converting static patterns into time-varying (spatio-

temporal) signals via scanning the image. This would lead to a number of

significant advantages:

– The recognition system becomes ‘shift invarient’– The spatio-temporal approach explains the image geometry since the local

spatial relationships in the image are expressed as local temporal variations in the scanned input.

– Reduction of complexity (from 2D to 1D)– The scanning approach allows a visual pattern recognition system to deal with

inputs of arbitrary extent (not only static fixed 2D pattern)21Neural Networks Dr. Randa Elanwar

Stochastic neural networks• Stochastic neural networks are a type of artificial neural

networks, which is a tool of artificial intelligence. They are built by introducing random variations into the network, either by giving the network's neurons stochastic transfer functions, or by giving them stochastic weights. This makes them useful tools for optimization problems, since the random fluctuations help it escape from local minima.

• Stochastic neural networks that are built by using stochastic transfer functions are often called Boltzmann machines.

• Stochastic neural networks have found applications in risk management, oncology, bioinformatics, and other similar fields


Stochastic Networks: Boltzmann machine

• The neurons are stochastic: at any time there is a probability attached to whether the neurons fires.

• Used for solving constrained optimization problems.• Typical Boltzmann Machine:

– Weights are fixed to represent the constrains of the problem and the function to be optimized.

– The net seeks the solution by changing the activations of the units (0 or 1) based on a probability distribution and the effect that the change would have on the energy function or consensus function for the net.

• May use either supervised or unsupervised learning.

• Learning in Boltzmann Machine is accomplished by using a Simulated Annealing technique which has stochastic nature. This is used to reduce the probability of the net becoming trapped in a local minimum which is not a global minimum.



• Learning characteristics:– Each neuron fires with bipolar values.

– All connections are symmetric.

– In activation passing, the next neuron whose state we wish to update is selected randomly.

– There are no self-feedback (connections from a neuron to itself)



• There are three phases in operation of the network:– The clamped phase in which the input and output of visible

neurons are held fixed, while the hidden neurons are allowed to vary.

– The free running phase in which only the inputs are held fixedand other neurons are allowed to vary.

– The learning phase.

• These phases iterate till learning has created a Boltzmann Machine which can be said to have learned the input patterns and will converge to the learned patterns when noisy or incomplete pattern is presented.



• For unsupervised learning Generally the initial weights of the net are randomly set to values in a small range e.g. -0.5 to +0.5.

• Then an input pattern is presented to the net and clamped to the visible neurons.

• choose a hidden neurons at random and flip its state from sj to –sjaccording to certain probability distribution

• The activation passing can continue till the net hidden neurons reach equilibrium.

• During free running phase, after presentation of the input patterns all neurons can update their states.

• The learning phase depends whether weight are changed depend on the difference between the "real" distribution (neuron state) in clamped phase and the one which will be produced (eventually) by the machine in free mode.



• For supervised learning the set of visible neurons is split into input and output neurons, and the machine will be used to associate an input pattern with an output pattern.

• During the clamped phase, the input and output patterns are clamped to the appropriate units.

• The hidden neurons’ activations can settle at the various values.

• During free running phase, only the input neurons are clamped – both the output neurons and the hidden neurons can pass activation round till the activations in the network settles.

• Learning rule here is the same as before but must be modulated (multiplied) by the probability of the input’s patterns


Neurocognition network

• Neurocognitive networks are large-scale systems of distributed and interconnected neuronal populations in the central nervous system organized to perform cognitive functions.

• many computer scientists try to simulate human cognition with computers. This line of research can be roughly split into two types: research seeking to create machines as adept as humans (or more so), and research attempting to figure out the computational basis of human cognition — that is, how the brain actually carries out its computations. This latter branch of research can be called computational modeling(while the former is often called artificial intelligence or AI).


http://www.scholarpedia.org/article/Neuron

https://sitn.hms.harvard.edu/sitnflash_wp/2012/07/brain-modeling/

https://sitn.hms.harvard.edu/sitnflash_wp/2012/06/ai/

introduction to neural networks (under graduate course) lecture 9 of 9

Education

types of mapping networks

propagationforward propagation

backward propagation

hidden layersmathematically

hidden neurons

hidden layerswe

randa elanwarthe

randa elanwargttin