2nd session machine learning: feed-forward neural networks...
TRANSCRIPT
1
2nd Session
Machine learning: feed-forward
neural networks and
self-organizing maps
2
Recommended reading
� J. Zupan, J. Gasteiger, Neural Networks in Chemistry
and Drug Design: An Introduction, Wiley-VCH,
Weinheim, 1999.
� Chemoinformatics - A Textbook, eds. Johnann
Gasteiger and Thomas Engel, Wiley-VCH, 2003.
� Handbook of Chemoinformatics, ed. Johnann Gasteiger,
Wiley-VCH, 2003.
3
Neural networks
Information processing systems
inspired on biological nervous systems.
Ability to learn from observations:
Extract knowledge
Identify relationships
Identify structures
Generalize
4
Statistical methods process information and ‘learn’.
The brain learns with no statistical methods!
Neural networks simulate nervous systems using algorithms
and mathematical models
NNs are interesting from a neuroscience point of view as models of
the brain.
NNs are interesting for computer science as computational tools.
Neural networks
5
input
output
A black box ?
Neural networks
6
input
output
Connected
functional units
NEURONS
Neural networks
7
The biological neuron
Cell body
Dendrites
Axon
The human nervous system has ca. 1015 neurons.
Transmission of an electric signal between dendrites and axons occurs
through the transport of ions.
Axon terminal
8
Neurons in the superficial layers of the visual cortex in the brain of a mice.
PLoS Biology Vol. 4, No. 2, e29 DOI: 10.1371/journal.pbio.0040029
The biological neuron
9
Synapses – neuron junctions
Axon – Dendrite : chemical signal (neurotransmitter).
Signal is transmitted in only one direction.
Some neurons are able to modify the signal transmission at the synapses.
10
Loss of connections between neurons in the Alzheimer disease
Synapses – neuron junctions
11
Neural networks
Similar neurons in different species.
The same type of signal.
What is essential is the whole set of neurons, and the connections.
THE NETWORK
12
Signal transmission at the synapse
The transmitted signal depends on the received signal and
the synaptic strength.
In artificial neurons, the synaptic strength is called weight.
w
s
p = ws
Signal s sent from a previous
neuron
Synapse with weight w
Signal p arriving at the neuron
after crossing a synapse
13
Synapses and learning
� Learning and memory are believed to result
from long-term changes in synaptic strength.
� In artificial neural networks, learning occurs
by correcting the weights.
14
Weights and net input
Each neuron receives signals (si) from many neurons.
0.1
-0.1
0.2
0.4
-0.3
0.5
0.2
0.2
-0.04 Net input = 0.04 = 0.4×0.2 – 0.1×0.1 –
– 0.5×0.3 + 0.2×0.2
inputs
synapses
15
Transfer functions
The net input is modified by a transfer function into an outputOut = f (Net)
16
Sigmoid transfer function
Out = 1 / (1 + e -Net)
Important: it is non-linear!
Derivative is easy to calculate:
d(Out) / d(Net) = Out (1-Out)
17
Simulation of an artificial neuron
http://lcn.epfl.ch/tutorial/english/aneuron/html/index.html
18
The ‘100 steps paradox’
� A neuron recovers approximately one millisecond (10-3 s) after
firing.
� The human brain is able to perform intelligent processes, such
as recognizing a friend's face or reacting to some danger, in
approximately one tenth of a second.
� Highly complex tasks have to be performed in less than 100
steps ?!
� Conclusion: many tasks must be performed simultaneously and
in parallel.
19
Neural network
Input layerInput layer
Hidden layerHidden layer
Output layerOutput layer
Input data
Output values
...
20
Architecture of a neural network
...
• Number of inputs and outputs
• Number of layers
• Number of neurons in each layer
• Number of weights in each neuron
• How neurons are connected
• Which neurons receive corrections
21
The ‘feed-forward’ or ‘backpropagation’ NN
Input data
22
The ‘backpropagation’ learning algorithm
1. Assignment of random values to neurons.
2. Input of an object X.
3. Computation of output values from all neurons in all layers.
4. Comparison of final output values with target values and
computation of an error.
5. Computation of corrections to be applied to the weights of
the last layer.
6. Computation of corrections to be applied to the weights of
the penultimate layer.
7. Application of corrections.
8. Return to step 2.
23
Introduction of a momentum parameter µ.
Correction = computed correction + µ × previous correction
The ‘backpropagation’ learning algorithm
24
Steps in the training of a BPG NN
� Analysis of the problemWhich inputs ? How many ?
Which output(s) ? How many ?
� Data pre-processingNormalization (output varies within ]0,1[ !).
Splitting into training, test, and prediction sets.
� Training with the training set and monitoring with the test set
(to decide when training shall be stopped).
� Repetition of training with different parameters (nr of hidden
neurons, rate, and momentum) until the best network is found for
the test set.
� Application of the best network found to the prediction set.
� Evaluation
25
Monitoring the training of a BPG NN
Stop training
26
BPG NNs using JATOON softwarehttp://www.dq.fct.unl.pt/staff/jas/jatoon
Training set
Test set
Optimum nr of epochs
27
BPG NNs in QSPRExample: prediction of 1H NMR chemical shifts
O
A
A
B
C
C
D
E
F
G
A
B
C
D
EF
G
Chemical shift (ppm)
BPG NNs
Training set with exp. values
Input: descriptors of H-atoms
Output: chemical shift
Y. Binev, J. Aires-de-Sousa; J. Chem. Inf. Comput. Sci. 2004, 44(3), 940-945.
28
Predictions with ASNN
Test with 952 + 259 protons
1 2 3 4 5 6 7 8 9
1
2
3
4
5
6
7
8
9
Pre
dic
ted C
hem
ical S
hift
Experimental Chemical Shift
Aromatics-Set A
Pi-Set A
Aliphatics-Set A
Rigids-Set A
Aromatics-Set B
Pi-Set B
Aliphatics-Set B
Rigids-Set B
R2= 0.9830
29
Prediction of 1H NMR spectra using BPG NNsThe SPINUS program: www.dq.fct.unl.pt/spinus
30
Self-organizing maps
31
Kohonen neural networks“self-organizing maps (SOMS)”
Algebraic view of a data set(values, signals, magnitudes,...)
vs.
Topological view of a data set(relationships between information)
32
Kohonen neural networks“self-organizing maps (SOMS)”
These are two-dimensional arrays of neurons that reflect as well as possible the topology of information, that is, the relationships between
individual pieces of data and not their magnitude.
Compression of information
Mapping on a 2D surface.
“Self-Organized Topological Features Maps”Preserve topology.
33
Kohonen neural networks
Goal
Mapping similar signals
onto neighbor neurons
34
Kohonen neural networks
Similar signals in neighbor neurons
Do similar signals correspond to the same class?
YESNO
35
Kohonen neural networksArchitecture
One layer of neurons.
36
Kohonen neural networksArchitecture
One layer of neurons.
n weights for each
neuron
(n = number of
inputs)
37
Kohonen neural networksTopology
Definition of distance between neurons
Neuron
1st neighborhood
2nd neighborhood
The output of a neuron
only affects neighbor
neurons
38
Kohonen neural networksToroidal surface
Neighborhood
Neuron
1st neighborhood
2nd neighborhood
39
Kohonen neural networksCompetitive learning
After the input, only one neuron is activated
(central neuron or winning neuron)
The central neuron is the one with the
most similar weights to the input.
Traditionaly, similarity = Euclidean distance
2
1
)(i
n
i
ixw∑
=
−n – number of inputs
w – value of the weight
x – value of the input
40
Kohonen neural networksCompetitive learning
winning
neuron
weights
41
Kohonen neural networksCompetitive learning
The weights of the winning neuron are corrected to make them even more similar
to the input. The weights of neighbor neurons are also adapted with the same
goal but to a lesser extent.
Neuron
1st neighborhood
2nd neighborhood
42
Kohonen neural networksCompetitive learning
The correction of the neighbor neurons after the activation of a neuron depends on:
1. The distance to the winning neuron (the farther, the smaller thecorrection)
2. The stage of the training (at the beginning corrections are moredrastic)
3. The difference between the weight and the input (the larger the difference, the stronger the correction).
43
Kohonen neural networksNormalization of data
The activation of neurons, and the corrections, depend on the Euclidean distance.
If the values of a descriptor are in a wider range than another, it will have a larger impact on the result.
Therefore, for all descriptors to make a similar impact, NORMALIZATION of data is required.
44
Kohonen neural networksNormalization of data
Example of normalization:
1. Find the maximum (MAX) and the minimum (MIN) value for a
descriptor.
2. Replace each value x by (x-MIN)/(MAX-MIN)
(now the descriptor varies between 0 and 1)
or by 0.1 + 0.8×(x-MIN)/(MAX-MIN)
(the descriptor will vary between 0.1 and 0.9, useful for BPG
networks)
45
Kohonen neural networksNormalization of data
Another example of normalization (z normalization):
1. Calculate the average (aver) and the standard deviation (sd) for a
descriptor.
2. Replace each value x by (x-aver)/sd
(the normalized descriptor will have average = 0 and standard
deviation = 1)
46
Kohonen neural networks : Application
Geographical classification of crude oil samples for the identification of spill sources.
From chemical features of oils.
Database of chemical features of oils from different geographical origins.
Sample(chemical features )
NEURALNETS
Geographical class
A. M. Fonseca, J. L. Biscaya, J. Aires-de-Sousa, A. M. Lobo,"Geographical
classification of crude oils by Kohonen self-organizing maps", Anal. Chim. Acta
2006, 556 (2), 374-382.
47
Chemical features of oils
Content in several compoundsdetermined by GC / MS
Examples
• (22R)17α(H),21β(H)-30,31-Bishomohopane / 17α(H),21β(H)-Hopane
• 18α(H)-Oleanane / 17α(H),21β(H)-Hopane
• 1-Isopropyl-2-methylnaphtalene
• 3-Methylphenanthrene
• 1-Methydibenzothiophene
3- Methylphenanthrene
H
H
H
H
18α(H)-Oleanane
48
Vector input
GC/MS descriptors for a
sample of oil
Kohonen neural networks
Weights
Winning neuron
49
Test set:
• 55 samples• 70% correct predictions
Test set:
• 55 samples• 70% correct predictions
Training set:
• 133 samples• 20 different geographical origins• 21 descriptors
• Good clustering• 97% correct predictions
Training set:
• 133 samples• 20 different geographical origins• 21 descriptors
• Good clustering• 97% correct predictions
Results
50
Input
layer
Output
layer
Counterpropagation (CPG) neural network
SOM with an output layer
51
Submission ofinput
input
output
Training of a CPG neural network
Correction of
the weights
at the input
layer
Correction of the
corresponding
weights at the
output layer
52
Submission ofinput
input
Prediction by a CPG neural network
prediction
53
A CPG neural network with several outputs
Prediction
Input
layer
Output
layer
Winning neuron
Training
54
CPGNN: application
Ability of a compound to bind GPCR (G-Protein-Coupled Receptors)P.Selzer, P. Ertl, QSAR Comb. Sci. 2005, 24, 270-276; J. Chem. Inf. Model. 2006, 46 (6), 2319 -2323.
55
CPGNN: application
Prediction of the ability to bind GPCR (G-Protein-Coupled Receptors)P.Selzer, P. Ertl, QSAR Comb. Sci. 2005, 24, 270-276; J. Chem. Inf. Model. 2006, 46 (6), 2319 -2323.
CPG network of size 250×250
Training set:24870 molecules randomly taken from catalogs (“drug-like”)
1709 known GPCR ligands
Input: 225 descriptors (RDF descriptors)
Output: 9 levels (GPCR and sub-family “adrenalin, bradykinin, dopamine,
endothelin, histamine, opioid, serotonin, vasopressin”). Binary values (0/1)
according to ‘YES’ or ‘NO’.
56
CPGNN: application to predict GPCR binding
P.Selzer, P. Ertl, QSAR Comb. Sci. 2005, 24, 270-276;
J. Chem. Inf. Model. 2006, 46 (6), 2319 -2323.
Results:1st output level(GPCR ligand)
Weight values are translated into colors.
Regions activated by ligands
57
CPGNN: application to predict GPCR binding
P.Selzer, P. Ertl, QSAR Comb. Sci. 2005, 24, 270-276; J. Chem. Inf. Model. 2006, 46 (6), 2319 -2323.
Results:output levels nr 4 (‘dopamine’) e nr 7 (‘opioid’)
58
CPGNN: application to predict GPCR binding
P.Selzer, P. Ertl, QSAR Comb. Sci. 2005, 24, 270-276; J. Chem. Inf. Model. 2006, 46 (6), 2319 -2323.
Results:
Test set
(25096 non-GPCR and 1490 GPCR)
71% of ligands correctly predicted
18% false positives
59
SOMs in the JATOON programhttp://www.dq.fct.unl.pt/staff/jas/jatoon
‘Paste’ data
60
SOMs in the JATOON programhttp://www.dq.fct.unl.pt/staff/jas/jatoon
Visualization of the
distribution of the objects.
Neurons colored
according to the classes
of the objects activating
them.
61
SOMs in the JATOON programhttp://www.dq.fct.unl.pt/staff/jas/jatoon
Distribution of the
objects.
62
SOMs in the JATOON programhttp://www.dq.fct.unl.pt/staff/jas/jatoon
Inspection of the weights
at level 2 of the input
layer.